md5sum issues

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

md5sum issues

Ivan Calandra-5
Dear useRs,

I have some kind of a weird issue with md5sum() and I'm not sure where I
should start.

I have a repository on GitHub, with a local Git installation and
connected with RStudio.
I am working on Windows 10 and a colleague of mine works on Linux.
We both pull the latest commits of all files, but the checksums are
different.
Even stranger (to me at least), I get a different checksum from the
local file (downloaded through Git via pulling) and the same file that I
manually download from GitHub. The checksum of the manual download from
GitHub is the same as that of my colleague on Linux.
This happens to all text-based files (Rmd, MD, CSV...) but not to
non-editable files (PDF, XLSX...).

For example (I have shortened the paths):
 > library(tools)

 > md5sum(file.choose()) # local repo
D:\\...\\SSFAcomparisonPaper\\README.md
"e3b08fc2ab8b3c8b57e681f862a77f32"

 > md5sum(file.choose()) # downloaded from GitHub
C:\\Users\\...\\Downloads\\README.md
"05fab51e18b962a9f3266c7b79016ce6"

 > md5sum(file.choose()) # local repo
D:\\...\\SSFAcomparisonPaper\\...\\SSFA_GuineaPigs_plot.pdf
"d9b331642bfd0d192e4eff5808b2a30f"

 > md5sum(file.choose()) # downloaded from GitHub
C:\\Users\\...\\Downloads\\SSFA_GuineaPigs_plot.pdf
"d9b331642bfd0d192e4eff5808b2a30f"

I am not sure whether it is an issue with the algorithm of md5sum(),
whether it's a R/RStudio/Git/GitHub/Windows issue, so I would be
grateful if you could help me sorting it out.

Thank you in advance,
Ivan

--

Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Jeff Newmiller
Sounds like a newline discrepancy issue. Highly unlikely to be an R issue.

On February 2, 2021 8:01:05 AM PST, Ivan Calandra <[hidden email]> wrote:

>Dear useRs,
>
>I have some kind of a weird issue with md5sum() and I'm not sure where
>I
>should start.
>
>I have a repository on GitHub, with a local Git installation and
>connected with RStudio.
>I am working on Windows 10 and a colleague of mine works on Linux.
>We both pull the latest commits of all files, but the checksums are
>different.
>Even stranger (to me at least), I get a different checksum from the
>local file (downloaded through Git via pulling) and the same file that
>I
>manually download from GitHub. The checksum of the manual download from
>
>GitHub is the same as that of my colleague on Linux.
>This happens to all text-based files (Rmd, MD, CSV...) but not to
>non-editable files (PDF, XLSX...).
>
>For example (I have shortened the paths):
> > library(tools)
>
> > md5sum(file.choose()) # local repo
>D:\\...\\SSFAcomparisonPaper\\README.md
>"e3b08fc2ab8b3c8b57e681f862a77f32"
>
> > md5sum(file.choose()) # downloaded from GitHub
>C:\\Users\\...\\Downloads\\README.md
>"05fab51e18b962a9f3266c7b79016ce6"
>
> > md5sum(file.choose()) # local repo
>D:\\...\\SSFAcomparisonPaper\\...\\SSFA_GuineaPigs_plot.pdf
>"d9b331642bfd0d192e4eff5808b2a30f"
>
> > md5sum(file.choose()) # downloaded from GitHub
>C:\\Users\\...\\Downloads\\SSFA_GuineaPigs_plot.pdf
>"d9b331642bfd0d192e4eff5808b2a30f"
>
>I am not sure whether it is an issue with the algorithm of md5sum(),
>whether it's a R/RStudio/Git/GitHub/Windows issue, so I would be
>grateful if you could help me sorting it out.
>
>Thank you in advance,
>Ivan

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Ivan Calandra-5
Thank you Jeff for the pointer.

If it's not an R issue, I guess it will be difficult to solve...
But maybe there is a workaround using R, like using another function or
editing the files...? Does anyone have any idea?

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 02/02/2021 17:05, Jeff Newmiller wrote:

> Sounds like a newline discrepancy issue. Highly unlikely to be an R issue.
>
> On February 2, 2021 8:01:05 AM PST, Ivan Calandra <[hidden email]> wrote:
>> Dear useRs,
>>
>> I have some kind of a weird issue with md5sum() and I'm not sure where
>> I
>> should start.
>>
>> I have a repository on GitHub, with a local Git installation and
>> connected with RStudio.
>> I am working on Windows 10 and a colleague of mine works on Linux.
>> We both pull the latest commits of all files, but the checksums are
>> different.
>> Even stranger (to me at least), I get a different checksum from the
>> local file (downloaded through Git via pulling) and the same file that
>> I
>> manually download from GitHub. The checksum of the manual download from
>>
>> GitHub is the same as that of my colleague on Linux.
>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>> non-editable files (PDF, XLSX...).
>>
>> For example (I have shortened the paths):
>>> library(tools)
>>> md5sum(file.choose()) # local repo
>> D:\\...\\SSFAcomparisonPaper\\README.md
>> "e3b08fc2ab8b3c8b57e681f862a77f32"
>>
>>> md5sum(file.choose()) # downloaded from GitHub
>> C:\\Users\\...\\Downloads\\README.md
>> "05fab51e18b962a9f3266c7b79016ce6"
>>
>>> md5sum(file.choose()) # local repo
>> D:\\...\\SSFAcomparisonPaper\\...\\SSFA_GuineaPigs_plot.pdf
>> "d9b331642bfd0d192e4eff5808b2a30f"
>>
>>> md5sum(file.choose()) # downloaded from GitHub
>> C:\\Users\\...\\Downloads\\SSFA_GuineaPigs_plot.pdf
>> "d9b331642bfd0d192e4eff5808b2a30f"
>>
>> I am not sure whether it is an issue with the algorithm of md5sum(),
>> whether it's a R/RStudio/Git/GitHub/Windows issue, so I would be
>> grateful if you could help me sorting it out.
>>
>> Thank you in advance,
>> Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Ivan Krylov
In reply to this post by Ivan Calandra-5
On Tue, 2 Feb 2021 17:01:05 +0100
Ivan Calandra <[hidden email]> wrote:

> This happens to all text-based files (Rmd, MD, CSV...) but not to
> non-editable files (PDF, XLSX...).

This is probably caused by Git helpfully converting text files from LF
(0x10) line endings to CR LF (0x13 0x10) when checking out the
repository clone on Windows (and back when checking in).

This configuration option is described in Pro Git:
https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Duncan Murdoch-2
On 03/02/2021 2:14 a.m., Ivan Krylov wrote:

> On Tue, 2 Feb 2021 17:01:05 +0100
> Ivan Calandra <[hidden email]> wrote:
>
>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>> non-editable files (PDF, XLSX...).
>
> This is probably caused by Git helpfully converting text files from LF
> (0x10) line endings to CR LF (0x13 0x10) when checking out the
> repository clone on Windows (and back when checking in).
>
> This configuration option is described in Pro Git:
> https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf

I agree with Ivan K, but don't agree with the advice in that book.

It's best to just leave files alone, not to convert between LF and
CR-LF.  I don't think this confuses many Windows editors these days, but
if your editor forces files into CR-LF form, you should fix the editor,
not try to work around it.

In my opinion everyone should run

  git config --global core.autocrlf false

Some more arguments for this (in the context of Github Actions) are here:

 
https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Ivan Calandra-5
Thank you Ivan and Duncan for your help.

I understand your point Duncan, but the thing is that I do have an issue
here.
Is it then due to RStudio or even Windows? If it is, I can forget about
a solution on that end, so I would focus on what I can do, and this Git
setting seems to be the best place to start.

Or am I missing something (I am still a newbie on these things...)?

Ivan C

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 03/02/2021 10:06, Duncan Murdoch wrote:

> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>> On Tue, 2 Feb 2021 17:01:05 +0100
>> Ivan Calandra <[hidden email]> wrote:
>>
>>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>>> non-editable files (PDF, XLSX...).
>>
>> This is probably caused by Git helpfully converting text files from LF
>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>> repository clone on Windows (and back when checking in).
>>
>> This configuration option is described in Pro Git:
>> https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf 
>>
>
> I agree with Ivan K, but don't agree with the advice in that book.
>
> It's best to just leave files alone, not to convert between LF and
> CR-LF.  I don't think this confuses many Windows editors these days,
> but if your editor forces files into CR-LF form, you should fix the
> editor, not try to work around it.
>
> In my opinion everyone should run
>
>  git config --global core.autocrlf false
>
> Some more arguments for this (in the context of Github Actions) are here:
>
>
> https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140 
>
>
> Duncan Murdoch
>
>
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Duncan Murdoch-2
On 03/02/2021 4:42 a.m., Ivan Calandra wrote:
> Thank you Ivan and Duncan for your help.
>
> I understand your point Duncan, but the thing is that I do have an issue
> here.
> Is it then due to RStudio or even Windows? If it is, I can forget about
> a solution on that end, so I would focus on what I can do, and this Git
> setting seems to be the best place to start.

In my opinion, you should run

  git config --global core.autocrlf false

in an RStudio terminal session.  That will set the git options so they
don't mess up the md5sum values.

You should also go to the RStudio options, and in the Code section,
Saving tab, choose Serialization to be Posix (LF) and default text
encoding to be UTF-8.

Unfortunately, RStudio will still mess up the .Rproj file (see
https://github.com/rstudio/rstudio/issues/1929); there's not much you
can do about that.  Just try not to commit the Windows version to the
repository if any non-Windows users are sharing it.

But do note that other people have different opinions.  They argue that
files should be converted to Windows native format by git.  That works
in some narrow use cases, but as soon as you try to extract a file from
git on one system and work on it on another, it breaks.

Duncan Murdoch


>
> Or am I missing something (I am still a newbie on these things...)?
>
> Ivan C
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> On 03/02/2021 10:06, Duncan Murdoch wrote:
>> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>>> On Tue, 2 Feb 2021 17:01:05 +0100
>>> Ivan Calandra <[hidden email]> wrote:
>>>
>>>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>>>> non-editable files (PDF, XLSX...).
>>>
>>> This is probably caused by Git helpfully converting text files from LF
>>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>>> repository clone on Windows (and back when checking in).
>>>
>>> This configuration option is described in Pro Git:
>>> https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf
>>>
>>
>> I agree with Ivan K, but don't agree with the advice in that book.
>>
>> It's best to just leave files alone, not to convert between LF and
>> CR-LF.  I don't think this confuses many Windows editors these days,
>> but if your editor forces files into CR-LF form, you should fix the
>> editor, not try to work around it.
>>
>> In my opinion everyone should run
>>
>>   git config --global core.autocrlf false
>>
>> Some more arguments for this (in the context of Github Actions) are here:
>>
>>
>> https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140
>>
>>
>> Duncan Murdoch
>>
>>
>>
>>
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Ivan Calandra-5
Thank you very much Duncan for your help. I'll try that.

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 03/02/2021 11:48, Duncan Murdoch wrote:

> In my opinion, you should run
>
>  git config --global core.autocrlf false
>
> in an RStudio terminal session.  That will set the git options so they
> don't mess up the md5sum values.
>
> You should also go to the RStudio options, and in the Code section,
> Saving tab, choose Serialization to be Posix (LF) and default text
> encoding to be UTF-8.
>
> Unfortunately, RStudio will still mess up the .Rproj file (see
> https://github.com/rstudio/rstudio/issues/1929); there's not much you
> can do about that.  Just try not to commit the Windows version to the
> repository if any non-Windows users are sharing it.
>
> But do note that other people have different opinions.  They argue
> that files should be converted to Windows native format by git. That
> works in some narrow use cases, but as soon as you try to extract a
> file from git on one system and work on it on another, it breaks.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Jeff Newmiller
In reply to this post by Duncan Murdoch-2
This CR vs LF vs CRLF newline discrepancy has been around since the 70s and the CP/M operating system. And it remains an issue in over-the-wire internet text protocols today, which actually use the CRLF version like Windows. Sorry, UNIX... world domination of LF encoding failed.

The problem with pretending there is no issue as Duncan is advocating is that text is treated differently than binary, and every time you pretend it isn't it comes back to bite you. Applying binary algorithms like MD5 to text is one of these areas where your expectation that this will be successful is what creates the problem in the first place. A similar issue occurs in file encoding.. two files may both contain the word "Hello" but if they are encoded in UCS16 and UTF8 respectively then the MD5 results will be different.

Git does not (currently) support differences in encoding, but it does support text vs non-text (newline) differences because they are unavoidable. Pushing forward with your expectation that text files should compare the same in binary by assuming text will always be like UNIX text just defers the problem for another day.

Since I don't know what problem you are actually trying to solve, I cannot offer a concrete solution. But I would begin by not assuming that MD5 works the same on text and binary files... because it doesn't.

On February 3, 2021 2:48:56 AM PST, Duncan Murdoch <[hidden email]> wrote:

>On 03/02/2021 4:42 a.m., Ivan Calandra wrote:
>> Thank you Ivan and Duncan for your help.
>>
>> I understand your point Duncan, but the thing is that I do have an
>issue
>> here.
>> Is it then due to RStudio or even Windows? If it is, I can forget
>about
>> a solution on that end, so I would focus on what I can do, and this
>Git
>> setting seems to be the best place to start.
>
>In my opinion, you should run
>
>  git config --global core.autocrlf false
>
>in an RStudio terminal session.  That will set the git options so they
>don't mess up the md5sum values.
>
>You should also go to the RStudio options, and in the Code section,
>Saving tab, choose Serialization to be Posix (LF) and default text
>encoding to be UTF-8.
>
>Unfortunately, RStudio will still mess up the .Rproj file (see
>https://github.com/rstudio/rstudio/issues/1929); there's not much you
>can do about that.  Just try not to commit the Windows version to the
>repository if any non-Windows users are sharing it.
>
>But do note that other people have different opinions.  They argue that
>
>files should be converted to Windows native format by git.  That works
>in some narrow use cases, but as soon as you try to extract a file from
>
>git on one system and work on it on another, it breaks.
>
>Duncan Murdoch
>
>
>>
>> Or am I missing something (I am still a newbie on these things...)?
>>
>> Ivan C
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> On 03/02/2021 10:06, Duncan Murdoch wrote:
>>> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>>>> On Tue, 2 Feb 2021 17:01:05 +0100
>>>> Ivan Calandra <[hidden email]> wrote:
>>>>
>>>>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>>>>> non-editable files (PDF, XLSX...).
>>>>
>>>> This is probably caused by Git helpfully converting text files from
>LF
>>>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>>>> repository clone on Windows (and back when checking in).
>>>>
>>>> This configuration option is described in Pro Git:
>>>>
>https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf
>>>>
>>>
>>> I agree with Ivan K, but don't agree with the advice in that book.
>>>
>>> It's best to just leave files alone, not to convert between LF and
>>> CR-LF.  I don't think this confuses many Windows editors these days,
>>> but if your editor forces files into CR-LF form, you should fix the
>>> editor, not try to work around it.
>>>
>>> In my opinion everyone should run
>>>
>>>   git config --global core.autocrlf false
>>>
>>> Some more arguments for this (in the context of Github Actions) are
>here:
>>>
>>>
>>>
>https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140
>>>
>>>
>>> Duncan Murdoch
>>>
>>>
>>>
>>>
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Ivan Calandra-5
Dear Jeff,

If I understood you correctly, it makes sense that I explain more about
my goal here:

I am trying to find ways to have analyses that are as reproducible as
possible (knowing that it is not going to be perfect). One part is to
show which file(s) I use as input and what output was created, so that
potential readers/users of my analysis can check that the file they have
is indeed the same that I use (and not a corrupted or modified version).
Does that make sense?

And for this purpose, I originally used file information (like creation
time and so on), but I quickly realized this doesn't help much. Then I
tried with MD5 and I thought it was solved, but it was obviously not solved.

Duncan solution seems to work (I have not fully checked yet, though),
but I am really open to other, more robust alternatives.

Thanks for the input!
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 03/02/2021 17:15, Jeff Newmiller wrote:

> This CR vs LF vs CRLF newline discrepancy has been around since the 70s and the CP/M operating system. And it remains an issue in over-the-wire internet text protocols today, which actually use the CRLF version like Windows. Sorry, UNIX... world domination of LF encoding failed.
>
> The problem with pretending there is no issue as Duncan is advocating is that text is treated differently than binary, and every time you pretend it isn't it comes back to bite you. Applying binary algorithms like MD5 to text is one of these areas where your expectation that this will be successful is what creates the problem in the first place. A similar issue occurs in file encoding.. two files may both contain the word "Hello" but if they are encoded in UCS16 and UTF8 respectively then the MD5 results will be different.
>
> Git does not (currently) support differences in encoding, but it does support text vs non-text (newline) differences because they are unavoidable. Pushing forward with your expectation that text files should compare the same in binary by assuming text will always be like UNIX text just defers the problem for another day.
>
> Since I don't know what problem you are actually trying to solve, I cannot offer a concrete solution. But I would begin by not assuming that MD5 works the same on text and binary files... because it doesn't.
>
> On February 3, 2021 2:48:56 AM PST, Duncan Murdoch <[hidden email]> wrote:
>> On 03/02/2021 4:42 a.m., Ivan Calandra wrote:
>>> Thank you Ivan and Duncan for your help.
>>>
>>> I understand your point Duncan, but the thing is that I do have an
>> issue
>>> here.
>>> Is it then due to RStudio or even Windows? If it is, I can forget
>> about
>>> a solution on that end, so I would focus on what I can do, and this
>> Git
>>> setting seems to be the best place to start.
>> In my opinion, you should run
>>
>>   git config --global core.autocrlf false
>>
>> in an RStudio terminal session.  That will set the git options so they
>> don't mess up the md5sum values.
>>
>> You should also go to the RStudio options, and in the Code section,
>> Saving tab, choose Serialization to be Posix (LF) and default text
>> encoding to be UTF-8.
>>
>> Unfortunately, RStudio will still mess up the .Rproj file (see
>> https://github.com/rstudio/rstudio/issues/1929); there's not much you
>> can do about that.  Just try not to commit the Windows version to the
>> repository if any non-Windows users are sharing it.
>>
>> But do note that other people have different opinions.  They argue that
>>
>> files should be converted to Windows native format by git.  That works
>> in some narrow use cases, but as soon as you try to extract a file from
>>
>> git on one system and work on it on another, it breaks.
>>
>> Duncan Murdoch
>>
>>
>>> Or am I missing something (I am still a newbie on these things...)?
>>>
>>> Ivan C
>>>
>>> --
>>> Dr. Ivan Calandra
>>> TraCEr, laboratory for Traceology and Controlled Experiments
>>> MONREPOS Archaeological Research Centre and
>>> Museum for Human Behavioural Evolution
>>> Schloss Monrepos
>>> 56567 Neuwied, Germany
>>> +49 (0) 2631 9772-243
>>> https://www.researchgate.net/profile/Ivan_Calandra
>>>
>>> On 03/02/2021 10:06, Duncan Murdoch wrote:
>>>> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>>>>> On Tue, 2 Feb 2021 17:01:05 +0100
>>>>> Ivan Calandra <[hidden email]> wrote:
>>>>>
>>>>>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>>>>>> non-editable files (PDF, XLSX...).
>>>>> This is probably caused by Git helpfully converting text files from
>> LF
>>>>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>>>>> repository clone on Windows (and back when checking in).
>>>>>
>>>>> This configuration option is described in Pro Git:
>>>>>
>> https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf
>>>> I agree with Ivan K, but don't agree with the advice in that book.
>>>>
>>>> It's best to just leave files alone, not to convert between LF and
>>>> CR-LF.  I don't think this confuses many Windows editors these days,
>>>> but if your editor forces files into CR-LF form, you should fix the
>>>> editor, not try to work around it.
>>>>
>>>> In my opinion everyone should run
>>>>
>>>>    git config --global core.autocrlf false
>>>>
>>>> Some more arguments for this (in the context of Github Actions) are
>> here:
>>>>
>>>>
>> https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140
>>>>
>>>> Duncan Murdoch
>>>>
>>>>
>>>>
>>>>
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Duncan Murdoch-2
In reply to this post by Jeff Newmiller
On 03/02/2021 11:15 a.m., Jeff Newmiller wrote:
> This CR vs LF vs CRLF newline discrepancy has been around since the 70s and the CP/M operating system. And it remains an issue in over-the-wire internet text protocols today, which actually use the CRLF version like Windows. Sorry, UNIX... world domination of LF encoding failed.
>
> The problem with pretending there is no issue as Duncan is advocating

That misrepresents my position.  Obviously there's an issue.  I'm
suggesting a simple solution.

Duncan Murdoch

is that text is treated differently than binary, and every time you
pretend it isn't it comes back to bite you. Applying binary algorithms
like MD5 to text is one of these areas where your expectation that this
will be successful is what creates the problem in the first place. A
similar issue occurs in file encoding.. two files may both contain the
word "Hello" but if they are encoded in UCS16 and UTF8 respectively then
the MD5 results will be different.

>
> Git does not (currently) support differences in encoding, but it does support text vs non-text (newline) differences because they are unavoidable. Pushing forward with your expectation that text files should compare the same in binary by assuming text will always be like UNIX text just defers the problem for another day.
>
> Since I don't know what problem you are actually trying to solve, I cannot offer a concrete solution. But I would begin by not assuming that MD5 works the same on text and binary files... because it doesn't.
>
> On February 3, 2021 2:48:56 AM PST, Duncan Murdoch <[hidden email]> wrote:
>> On 03/02/2021 4:42 a.m., Ivan Calandra wrote:
>>> Thank you Ivan and Duncan for your help.
>>>
>>> I understand your point Duncan, but the thing is that I do have an
>> issue
>>> here.
>>> Is it then due to RStudio or even Windows? If it is, I can forget
>> about
>>> a solution on that end, so I would focus on what I can do, and this
>> Git
>>> setting seems to be the best place to start.
>>
>> In my opinion, you should run
>>
>>   git config --global core.autocrlf false
>>
>> in an RStudio terminal session.  That will set the git options so they
>> don't mess up the md5sum values.
>>
>> You should also go to the RStudio options, and in the Code section,
>> Saving tab, choose Serialization to be Posix (LF) and default text
>> encoding to be UTF-8.
>>
>> Unfortunately, RStudio will still mess up the .Rproj file (see
>> https://github.com/rstudio/rstudio/issues/1929); there's not much you
>> can do about that.  Just try not to commit the Windows version to the
>> repository if any non-Windows users are sharing it.
>>
>> But do note that other people have different opinions.  They argue that
>>
>> files should be converted to Windows native format by git.  That works
>> in some narrow use cases, but as soon as you try to extract a file from
>>
>> git on one system and work on it on another, it breaks.
>>
>> Duncan Murdoch
>>
>>
>>>
>>> Or am I missing something (I am still a newbie on these things...)?
>>>
>>> Ivan C
>>>
>>> --
>>> Dr. Ivan Calandra
>>> TraCEr, laboratory for Traceology and Controlled Experiments
>>> MONREPOS Archaeological Research Centre and
>>> Museum for Human Behavioural Evolution
>>> Schloss Monrepos
>>> 56567 Neuwied, Germany
>>> +49 (0) 2631 9772-243
>>> https://www.researchgate.net/profile/Ivan_Calandra
>>>
>>> On 03/02/2021 10:06, Duncan Murdoch wrote:
>>>> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>>>>> On Tue, 2 Feb 2021 17:01:05 +0100
>>>>> Ivan Calandra <[hidden email]> wrote:
>>>>>
>>>>>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>>>>>> non-editable files (PDF, XLSX...).
>>>>>
>>>>> This is probably caused by Git helpfully converting text files from
>> LF
>>>>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>>>>> repository clone on Windows (and back when checking in).
>>>>>
>>>>> This configuration option is described in Pro Git:
>>>>>
>> https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf
>>>>>
>>>>
>>>> I agree with Ivan K, but don't agree with the advice in that book.
>>>>
>>>> It's best to just leave files alone, not to convert between LF and
>>>> CR-LF.  I don't think this confuses many Windows editors these days,
>>>> but if your editor forces files into CR-LF form, you should fix the
>>>> editor, not try to work around it.
>>>>
>>>> In my opinion everyone should run
>>>>
>>>>    git config --global core.autocrlf false
>>>>
>>>> Some more arguments for this (in the context of Github Actions) are
>> here:
>>>>
>>>>
>>>>
>> https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140
>>>>
>>>>
>>>> Duncan Murdoch
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Jeff Newmiller
In reply to this post by Ivan Calandra-5
Well, you can use binary input files like RDS, qs, or parquet. But you already have your code and data in Git, so checking your input is redundant... just put in a binary output reference file and a test that verifies it.

On February 3, 2021 8:25:33 AM PST, Ivan Calandra <[hidden email]> wrote:

>Dear Jeff,
>
>If I understood you correctly, it makes sense that I explain more about
>
>my goal here:
>
>I am trying to find ways to have analyses that are as reproducible as
>possible (knowing that it is not going to be perfect). One part is to
>show which file(s) I use as input and what output was created, so that
>potential readers/users of my analysis can check that the file they
>have
>is indeed the same that I use (and not a corrupted or modified
>version).
>Does that make sense?
>
>And for this purpose, I originally used file information (like creation
>
>time and so on), but I quickly realized this doesn't help much. Then I
>tried with MD5 and I thought it was solved, but it was obviously not
>solved.
>
>Duncan solution seems to work (I have not fully checked yet, though),
>but I am really open to other, more robust alternatives.
>
>Thanks for the input!
>Ivan
>
>--
>Dr. Ivan Calandra
>TraCEr, laboratory for Traceology and Controlled Experiments
>MONREPOS Archaeological Research Centre and
>Museum for Human Behavioural Evolution
>Schloss Monrepos
>56567 Neuwied, Germany
>+49 (0) 2631 9772-243
>https://www.researchgate.net/profile/Ivan_Calandra
>
>On 03/02/2021 17:15, Jeff Newmiller wrote:
>> This CR vs LF vs CRLF newline discrepancy has been around since the
>70s and the CP/M operating system. And it remains an issue in
>over-the-wire internet text protocols today, which actually use the
>CRLF version like Windows. Sorry, UNIX... world domination of LF
>encoding failed.
>>
>> The problem with pretending there is no issue as Duncan is advocating
>is that text is treated differently than binary, and every time you
>pretend it isn't it comes back to bite you. Applying binary algorithms
>like MD5 to text is one of these areas where your expectation that this
>will be successful is what creates the problem in the first place. A
>similar issue occurs in file encoding.. two files may both contain the
>word "Hello" but if they are encoded in UCS16 and UTF8 respectively
>then the MD5 results will be different.
>>
>> Git does not (currently) support differences in encoding, but it does
>support text vs non-text (newline) differences because they are
>unavoidable. Pushing forward with your expectation that text files
>should compare the same in binary by assuming text will always be like
>UNIX text just defers the problem for another day.
>>
>> Since I don't know what problem you are actually trying to solve, I
>cannot offer a concrete solution. But I would begin by not assuming
>that MD5 works the same on text and binary files... because it doesn't.
>>
>> On February 3, 2021 2:48:56 AM PST, Duncan Murdoch
><[hidden email]> wrote:
>>> On 03/02/2021 4:42 a.m., Ivan Calandra wrote:
>>>> Thank you Ivan and Duncan for your help.
>>>>
>>>> I understand your point Duncan, but the thing is that I do have an
>>> issue
>>>> here.
>>>> Is it then due to RStudio or even Windows? If it is, I can forget
>>> about
>>>> a solution on that end, so I would focus on what I can do, and this
>>> Git
>>>> setting seems to be the best place to start.
>>> In my opinion, you should run
>>>
>>>   git config --global core.autocrlf false
>>>
>>> in an RStudio terminal session.  That will set the git options so
>they
>>> don't mess up the md5sum values.
>>>
>>> You should also go to the RStudio options, and in the Code section,
>>> Saving tab, choose Serialization to be Posix (LF) and default text
>>> encoding to be UTF-8.
>>>
>>> Unfortunately, RStudio will still mess up the .Rproj file (see
>>> https://github.com/rstudio/rstudio/issues/1929); there's not much
>you
>>> can do about that.  Just try not to commit the Windows version to
>the
>>> repository if any non-Windows users are sharing it.
>>>
>>> But do note that other people have different opinions.  They argue
>that
>>>
>>> files should be converted to Windows native format by git.  That
>works
>>> in some narrow use cases, but as soon as you try to extract a file
>from
>>>
>>> git on one system and work on it on another, it breaks.
>>>
>>> Duncan Murdoch
>>>
>>>
>>>> Or am I missing something (I am still a newbie on these things...)?
>>>>
>>>> Ivan C
>>>>
>>>> --
>>>> Dr. Ivan Calandra
>>>> TraCEr, laboratory for Traceology and Controlled Experiments
>>>> MONREPOS Archaeological Research Centre and
>>>> Museum for Human Behavioural Evolution
>>>> Schloss Monrepos
>>>> 56567 Neuwied, Germany
>>>> +49 (0) 2631 9772-243
>>>> https://www.researchgate.net/profile/Ivan_Calandra
>>>>
>>>> On 03/02/2021 10:06, Duncan Murdoch wrote:
>>>>> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>>>>>> On Tue, 2 Feb 2021 17:01:05 +0100
>>>>>> Ivan Calandra <[hidden email]> wrote:
>>>>>>
>>>>>>> This happens to all text-based files (Rmd, MD, CSV...) but not
>to
>>>>>>> non-editable files (PDF, XLSX...).
>>>>>> This is probably caused by Git helpfully converting text files
>from
>>> LF
>>>>>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>>>>>> repository clone on Windows (and back when checking in).
>>>>>>
>>>>>> This configuration option is described in Pro Git:
>>>>>>
>>>
>https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf
>>>>> I agree with Ivan K, but don't agree with the advice in that book.
>>>>>
>>>>> It's best to just leave files alone, not to convert between LF and
>>>>> CR-LF.  I don't think this confuses many Windows editors these
>days,
>>>>> but if your editor forces files into CR-LF form, you should fix
>the
>>>>> editor, not try to work around it.
>>>>>
>>>>> In my opinion everyone should run
>>>>>
>>>>>    git config --global core.autocrlf false
>>>>>
>>>>> Some more arguments for this (in the context of Github Actions)
>are
>>> here:
>>>>>
>>>>>
>>>
>https://github.community/t/git-config-core-autocrlf-should-default-to-false/16140
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: md5sum issues

Ivan Calandra-5
In reply to this post by Ivan Calandra-5
Dear Tim, Jeff, Duncan and Ivan,

Thank you all for your input! Actually, I am already doing what Tim
suggested, and as Jeff said, using the checksums is redundant since I
use Git already (which does a better job).

So I've decided to just remove the checksums from my scripts, and to
revert the RStudio settings to default but use `git config --global
core.autocrlf true` as mentioned in the Git guide for better
compatibility with other platforms. I might change that in the future
but I prefer adjusting Git rather than RStudio because I do not only use
RStudio (I use it for R, but not necessarily or exclusively for other
projects).

Best wishes,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 03/02/2021 18:07, Ebert,Timothy Aaron wrote:

> Dear Ivan,
> Why not put your data file and analysis program into one folder and then provide enough description (including raw output) to enable someone to run the program (a read me file). This includes all the packages loaded into your version of R, file names, file types, and the logic behind the program. The raw output will help people check that their version is working (as you have indicated). If your data or results are critical then someone will figure out an update to keep things working on the computers of the future. Good documentation and labeling is a better path to long term reproducibility than trying to make everything generic. I tend to add documentation within a program as much as writing separate read me files.
>     R lends itself to this paradigm more than others, but many programs require that the person first have a copy of some proprietary software. I am not about to buy SPSS, but I will agree on principle that the SPSS code represents reproducible science. If I really care I will translate the SPSS code into a language that I can use.
>
> Tim
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Ivan Calandra
> Sent: Wednesday, February 3, 2021 11:26 AM
> To: [hidden email]
> Subject: Re: [R] md5sum issues
>
> [External Email]
>
> Dear Jeff,
>
> If I understood you correctly, it makes sense that I explain more about my goal here:
>
> I am trying to find ways to have analyses that are as reproducible as possible (knowing that it is not going to be perfect). One part is to show which file(s) I use as input and what output was created, so that potential readers/users of my analysis can check that the file they have is indeed the same that I use (and not a corrupted or modified version).
> Does that make sense?
>
> And for this purpose, I originally used file information (like creation time and so on), but I quickly realized this doesn't help much. Then I tried with MD5 and I thought it was solved, but it was obviously not solved.
>
> Duncan solution seems to work (I have not fully checked yet, though), but I am really open to other, more robust alternatives.
>
> Thanks for the input!
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.researchgate.net_profile_Ivan-5FCalandra&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=EiZdNX4gKQm1kXIBD7w1MyjaAT7f3s-tk9mpNNCKW2U&e=
>
> On 03/02/2021 17:15, Jeff Newmiller wrote:
>> This CR vs LF vs CRLF newline discrepancy has been around since the 70s and the CP/M operating system. And it remains an issue in over-the-wire internet text protocols today, which actually use the CRLF version like Windows. Sorry, UNIX... world domination of LF encoding failed.
>>
>> The problem with pretending there is no issue as Duncan is advocating is that text is treated differently than binary, and every time you pretend it isn't it comes back to bite you. Applying binary algorithms like MD5 to text is one of these areas where your expectation that this will be successful is what creates the problem in the first place. A similar issue occurs in file encoding.. two files may both contain the word "Hello" but if they are encoded in UCS16 and UTF8 respectively then the MD5 results will be different.
>>
>> Git does not (currently) support differences in encoding, but it does support text vs non-text (newline) differences because they are unavoidable. Pushing forward with your expectation that text files should compare the same in binary by assuming text will always be like UNIX text just defers the problem for another day.
>>
>> Since I don't know what problem you are actually trying to solve, I cannot offer a concrete solution. But I would begin by not assuming that MD5 works the same on text and binary files... because it doesn't.
>>
>> On February 3, 2021 2:48:56 AM PST, Duncan Murdoch <[hidden email]> wrote:
>>> On 03/02/2021 4:42 a.m., Ivan Calandra wrote:
>>>> Thank you Ivan and Duncan for your help.
>>>>
>>>> I understand your point Duncan, but the thing is that I do have an
>>> issue
>>>> here.
>>>> Is it then due to RStudio or even Windows? If it is, I can forget
>>> about
>>>> a solution on that end, so I would focus on what I can do, and this
>>> Git
>>>> setting seems to be the best place to start.
>>> In my opinion, you should run
>>>
>>>    git config --global core.autocrlf false
>>>
>>> in an RStudio terminal session.  That will set the git options so
>>> they don't mess up the md5sum values.
>>>
>>> You should also go to the RStudio options, and in the Code section,
>>> Saving tab, choose Serialization to be Posix (LF) and default text
>>> encoding to be UTF-8.
>>>
>>> Unfortunately, RStudio will still mess up the .Rproj file (see
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rstud
>>> io_rstudio_issues_1929&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVe
>>> AsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=ruLgGdq-VfeMEeCAbfLgxe2bq6rlBB_wvO_A40iyuFk&e= ); there's not much you can do about that.  Just try not to commit the Windows version to the repository if any non-Windows users are sharing it.
>>>
>>> But do note that other people have different opinions.  They argue
>>> that
>>>
>>> files should be converted to Windows native format by git.  That
>>> works in some narrow use cases, but as soon as you try to extract a
>>> file from
>>>
>>> git on one system and work on it on another, it breaks.
>>>
>>> Duncan Murdoch
>>>
>>>
>>>> Or am I missing something (I am still a newbie on these things...)?
>>>>
>>>> Ivan C
>>>>
>>>> --
>>>> Dr. Ivan Calandra
>>>> TraCEr, laboratory for Traceology and Controlled Experiments
>>>> MONREPOS Archaeological Research Centre and Museum for Human
>>>> Behavioural Evolution Schloss Monrepos
>>>> 56567 Neuwied, Germany
>>>> +49 (0) 2631 9772-243
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.researchgat
>>>> e.net_profile_Ivan-5FCalandra&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9P
>>>> EhQh2kVeAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s
>>>> =EiZdNX4gKQm1kXIBD7w1MyjaAT7f3s-tk9mpNNCKW2U&e=
>>>>
>>>> On 03/02/2021 10:06, Duncan Murdoch wrote:
>>>>> On 03/02/2021 2:14 a.m., Ivan Krylov wrote:
>>>>>> On Tue, 2 Feb 2021 17:01:05 +0100
>>>>>> Ivan Calandra <[hidden email]> wrote:
>>>>>>
>>>>>>> This happens to all text-based files (Rmd, MD, CSV...) but not to
>>>>>>> non-editable files (PDF, XLSX...).
>>>>>> This is probably caused by Git helpfully converting text files
>>>>>> from
>>> LF
>>>>>> (0x10) line endings to CR LF (0x13 0x10) when checking out the
>>>>>> repository clone on Windows (and back when checking in).
>>>>>>
>>>>>> This configuration option is described in Pro Git:
>>>>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git-2Dscm.com_bo
>>> ok_en_v2_Customizing-2DGit-2DGit-2DConfiguration-23-5Fcore-5Fautocrlf
>>> &d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ibNvoty
>>> I5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=ZRHfdIX9MHy0op06VvZuG01oy2zYte
>>> 5uCSDnCUmwQfk&e=
>>>>> I agree with Ivan K, but don't agree with the advice in that book.
>>>>>
>>>>> It's best to just leave files alone, not to convert between LF and
>>>>> CR-LF.  I don't think this confuses many Windows editors these
>>>>> days, but if your editor forces files into CR-LF form, you should
>>>>> fix the editor, not try to work around it.
>>>>>
>>>>> In my opinion everyone should run
>>>>>
>>>>>     git config --global core.autocrlf false
>>>>>
>>>>> Some more arguments for this (in the context of Github Actions) are
>>> here:
>>>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.community
>>> _t_git-2Dconfig-2Dcore-2Dautocrlf-2Dshould-2Ddefault-2Dto-2Dfalse_161
>>> 40&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ibNvo
>>> tyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=unUVOo2qDSwSwD9el-0TZK15ERlM
>>> sU-fmqQ7TQ36LxE&e=
>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma
>>>> ilman_listinfo_r-2Dhelp&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2k
>>>> VeAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=65-81
>>>> xtWqRiX3c1BKHs8sst32wKQHJ4tdVabERDDGnE&e=
>>>> PLEASE do read the posting guide
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.o
>>> rg_posting-2Dguide.html&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kV
>>> eAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=m215LP3
>>> f7naPvS2_dxjXoVfkxhuWcn3VpBDGrGlhKmY&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mai
>>> lman_listinfo_r-2Dhelp&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVe
>>> AsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=65-81xtW
>>> qRiX3c1BKHs8sst32wKQHJ4tdVabERDDGnE&e=
>>> PLEASE do read the posting guide
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.o
>>> rg_posting-2Dguide.html&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kV
>>> eAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=m215LP3
>>> f7naPvS2_dxjXoVfkxhuWcn3VpBDGrGlhKmY&e=
>>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=65-81xtWqRiX3c1BKHs8sst32wKQHJ4tdVabERDDGnE&e=
> PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ibNvotyI5H_gn8tI80BOPNvA_3I0hXaKn8B38TS_yLY&s=m215LP3f7naPvS2_dxjXoVfkxhuWcn3VpBDGrGlhKmY&e=
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.