Community Feedback: Git Repository for R-Devel

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Community Feedback: Git Repository for R-Devel

Juan Telleria
UNBIASED FACTS:
• Bugzilla & R-devel Mailing Lists: Remain unchanged: Understood as
Ticketing platforms for bug pull requests on the R-devel Git Repository.

•  Git Repository Options:
A) Github (Cloud with Automated backups from GitHub to CRAN Server):
https://github.com
B) Gitlab (Selfhosted on CRAN): https://about.gitlab.com
C) Phabricator (Selfhosted on CRAN): https://www.phacility.com
D) Microsoft Codeplex: https://www.codeplex.com
E) Others: Unknown

GOOGLE TRENDS:
https://trends.google.com/trends/explore?date=all&q=Git,Svn,Github,Gitlab

EXAMPLE
Git Repository on Core Python: https://github.com/python

PERSONAL OPINION / MOTIVATION:
I think that moving efforts in this direction is important because it would
allow a true Open Source Innovation & Open Collaboration in R between:
* R Community.
* And R-Core.
For:
* R Bug Fixes.
* And Core Feature Wishlist.
As anyone would be able to:
* Check the unassigned bugs in Bugzilla (apart from R-Core).
* And propose bugs fixes by themselves as Pull requests (by mentioning the
Bug ID of Bugzilla or the Mailing Lists).

This would allow that _individuals_ either from Universities or Companies
interested in the Development of R:
* apart of donating economical resources to the R Foundation.
* could help to maintain core R Code by themselves.
Which aligns with the true spirit of R, which shall be done from
contributing individuals, for individuals themselves.

It would also allow to put the focus on the precise lines of code changed
with each Commit, and revert changes in an easy way, without verbose
E-mails: Tidy, Clean, Maintainable, and Fast.

At last, I noticed R-devel Archives do not have an E-mail Id (Unique
Unsigned Integer), so it would be a good idea to add one for pull requests
if Git was adopted.

Juan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Community Feedback: Git Repository for R-Devel

Mark van der Loo
This question has been discussed before on this list:
http://r.789695.n4.nabble.com/Why-R-project-source-code-is-not-on-Github-td4695779.html

See especially Jeroen's answer.

Best,
Mark

Op do 4 jan. 2018 om 01:11 schreef Juan Telleria <[hidden email]>:

> UNBIASED FACTS:
> • Bugzilla & R-devel Mailing Lists: Remain unchanged: Understood as
> Ticketing platforms for bug pull requests on the R-devel Git Repository.
>
> •  Git Repository Options:
> A) Github (Cloud with Automated backups from GitHub to CRAN Server):
> https://github.com
> B) Gitlab (Selfhosted on CRAN): https://about.gitlab.com
> C) Phabricator (Selfhosted on CRAN): https://www.phacility.com
> D) Microsoft Codeplex: https://www.codeplex.com
> E) Others: Unknown
>
> GOOGLE TRENDS:
> https://trends.google.com/trends/explore?date=all&q=Git,Svn,Github,Gitlab
>
> EXAMPLE
> Git Repository on Core Python: https://github.com/python
>
> PERSONAL OPINION / MOTIVATION:
> I think that moving efforts in this direction is important because it would
> allow a true Open Source Innovation & Open Collaboration in R between:
> * R Community.
> * And R-Core.
> For:
> * R Bug Fixes.
> * And Core Feature Wishlist.
> As anyone would be able to:
> * Check the unassigned bugs in Bugzilla (apart from R-Core).
> * And propose bugs fixes by themselves as Pull requests (by mentioning the
> Bug ID of Bugzilla or the Mailing Lists).
>
> This would allow that _individuals_ either from Universities or Companies
> interested in the Development of R:
> * apart of donating economical resources to the R Foundation.
> * could help to maintain core R Code by themselves.
> Which aligns with the true spirit of R, which shall be done from
> contributing individuals, for individuals themselves.
>
> It would also allow to put the focus on the precise lines of code changed
> with each Commit, and revert changes in an easy way, without verbose
> E-mails: Tidy, Clean, Maintainable, and Fast.
>
> At last, I noticed R-devel Archives do not have an E-mail Id (Unique
> Unsigned Integer), so it would be a good idea to add one for pull requests
> if Git was adopted.
>
> Juan
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Community Feedback: Git Repository for R-Devel

Juan Telleria
Thank you Mark, this is what I was looking for.

On Sunday I will read again in detail previous discussion's facts, and
attach the pros and cons here, so that they remain for the future, and the
topic can be closed.

Juan

El 4 ene. 2018 11:06 a. m., "Mark van der Loo" <[hidden email]>
escribió:

> This question has been discussed before on this list:
> http://r.789695.n4.nabble.com/Why-R-project-source-code-is-
> not-on-Github-td4695779.html
>
> See especially Jeroen's answer.
>
> Best,
> Mark
>
> Op do 4 jan. 2018 om 01:11 schreef Juan Telleria <[hidden email]>:
>
>> UNBIASED FACTS:
>> • Bugzilla & R-devel Mailing Lists: Remain unchanged: Understood as
>> Ticketing platforms for bug pull requests on the R-devel Git Repository.
>>
>> •  Git Repository Options:
>> A) Github (Cloud with Automated backups from GitHub to CRAN Server):
>> https://github.com
>> B) Gitlab (Selfhosted on CRAN): https://about.gitlab.com
>> C) Phabricator (Selfhosted on CRAN): https://www.phacility.com
>> D) Microsoft Codeplex: https://www.codeplex.com
>> E) Others: Unknown
>>
>> GOOGLE TRENDS:
>> https://trends.google.com/trends/explore?date=all&q=Git,Svn,Github,Gitlab
>>
>> EXAMPLE
>> Git Repository on Core Python: https://github.com/python
>>
>> PERSONAL OPINION / MOTIVATION:
>> I think that moving efforts in this direction is important because it
>> would
>> allow a true Open Source Innovation & Open Collaboration in R between:
>> * R Community.
>> * And R-Core.
>> For:
>> * R Bug Fixes.
>> * And Core Feature Wishlist.
>> As anyone would be able to:
>> * Check the unassigned bugs in Bugzilla (apart from R-Core).
>> * And propose bugs fixes by themselves as Pull requests (by mentioning the
>> Bug ID of Bugzilla or the Mailing Lists).
>>
>> This would allow that _individuals_ either from Universities or Companies
>> interested in the Development of R:
>> * apart of donating economical resources to the R Foundation.
>> * could help to maintain core R Code by themselves.
>> Which aligns with the true spirit of R, which shall be done from
>> contributing individuals, for individuals themselves.
>>
>> It would also allow to put the focus on the precise lines of code changed
>> with each Commit, and revert changes in an easy way, without verbose
>> E-mails: Tidy, Clean, Maintainable, and Fast.
>>
>> At last, I noticed R-devel Archives do not have an E-mail Id (Unique
>> Unsigned Integer), so it would be a good idea to add one for pull requests
>> if Git was adopted.
>>
>> Juan
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Community Feedback: Git Repository for R-Devel

Juan Telleria
In reply to this post by Mark van der Loo
I attach a basic State of Art:

##########################################################################################################################################
# State of Art Analysis of Git vs SVN
##########################################################################################################################################

Scopus Keywords: GIT AND SVN

##########################################################################################################################################
# 1. How Do Centralized (SVN) and Distributed Version Control (GIT) Systems
Impact Software Changes? (22 Citations; Published: 2014)
##########################################################################################################################################

1.1 Paper Conclusions

We found that the use of CVCS and DVCS have observable effects on
developers, teams and processes. The most surprising findings are that (i)
the size of commits in DVCS was smaller than in CVCS, (ii) developers split
commits (group changes by intent) more often in DVCS, and (iii) DVCS
commits are more likely to reference issue tracking labels. These show that
DVCS contain higher quality commits compared to CVCS due to their smaller
size, cohesive changes and the presence of issue tracking labels. The
survey provided valuable information on why developers prefer one paradigm
versus the other. DVCS are preferred because of killer features, such as
the ability of committing locally. In contrast CVCS are preferred for their
ease of use and faster learning curve.

1.2 Full Paper

http://dig.cs.illinois.edu/papers/ICSE14_Caius.pdf

##########################################################################################################################################
# 2 Version Control with Git (Book: J Loeliger, M McCullough – 2012)
##########################################################################################################################################

2.1 Book Introduction

***The Birth of Git***

Often, when there is discord between a tool and a project, the developers
simply create a new tool. Indeed, in the world of software, the temptation
to create new tools can be deceptively easy and inviting. In the face of
many existing version control systems, the decision to create another
shouldn’t be made casually. However, given a critical need, a bit of
insight, and a healthy dose of motivation, forging a new tool can be
exactly the right course.

Git, affectionately termed “the information manager from hell” by its
creator is such a tool. Although the precise circumstances and timing of
its genesis are shrouded in political wrangling within the Linux Kernel
community, there is no doubt that what came from that fire is a
well-engineered version control system capable of supporting worldwide
development of software on a large scale.

Prior to Git, the Linux Kernel was developed using the commercial BitKeeper
VCS, which provided sophisticated operations not available in then-current,
free software version control systems such as RCS and CVS. However, when
the company that owned BitKeeper placed additional restrictions on its
“free as in beer” version in the spring of 2005, the Linux community
realized that BitKeeper was no longer a viable solution.

Linus looked for alternatives. Eschewing commercial solutions, he studied
the free software packages but found the same limitations and flaws that
led him to reject them previously. What was wrong with the existing VCS
systems? What were the elusive missing features or characteristics that
Linus wanted and couldn’t find?

***Facilitate distributed development***

There are many facets to “distributed development,” and Linus wanted a new
VCS that would cover most of them. It had to allow parallel as well as
independent and simultaneous development in private repositories without
the need for constant synchronization with a central repository, which
could form a development bottleneck. It had to allow multiple developers in
multiple locations even if some of them were offline temporarily.

***Scale to handle thousands of developers***

It isn’t enough just to have a distributed development model. Linus knew
that thousands of developers contribute to each Linux release, so any new
VCS had to handle a very large number of developers, whether they were
working on the same or on different parts of a common project. And the new
VCS had to be able to integrate all of their work reliably.

***Perform quickly and efficiently***

 Linus was determined to ensure that a new VCS was fast and efficient. In
order to support the sheer volume of update operations that would be made
on the Linux Kernel alone, he knew that both individual update operations
and network transfer operations would have to be very fast. To save space
and thus transfer time, compression and “delta” techniques would be needed.
Using a distributed model instead of a centralized model also ensured that
network latency would not hinder daily development.

***Maintain integrity and trust***

Because Git is a distributed revision control system, it is vital to obtain
absolute assurance that data integrity is maintained and is not somehow
being altered. How do you know the data hasn’t been altered in transition
from one developer to the next, or from one repository to the next? For
that matter, how do you know that the data in a Git repository is even what
it purports to be?

Git uses a common cryptographic hash function, called Secure Hash Function
(SHA1), to name and identify objects within its database. Although perhaps
not absolute, in practice it has proven to be solid enough to ensure
integrity and trust for all of Git’s distributed repositories.

***Enforce accountability***

One of the key aspects of a version control system is knowing who changed
files, and if at all possible, why. Git enforces a change log on every
commit that changes a file. The information stored in that change log is
left up to the developer, project requirements, management, convention,
etc. Git ensures that changes will not happen mysteriously to files under
version control because there is an accountability trail for all changes.

***Immutability***

Git’s repository database contains data objects that are immutable. That
is, once they have been created and placed in the database, they cannot be
modified. They can be recreated differently, of course, but the original
data cannot be altered without consequences. The design of the Git database
means that the entire history stored within the version control database is
also immutable. Using immutable objects has several advantages, including
very quick comparison for equality.

***Atomic transactions***

With atomic transactions, a number of different but related changes are
performed either all together or not at all. This property ensures that the
version control database is not left in a partially changed (and hence
possibly corrupted) state while an update or commit is happening. Git
implements atomic transactions by recording complete, discrete repository
states that cannot be broken down into individual or smaller state changes.

***Support and encourage branched development***

Almost all VCSs can name different genealogies of development within a
single project. For instance, one sequence of code changes could be called
“development” while another is referred to as “test.” Each version control
system can also split a single line of development into multiple lines and
then unify, or merge, the disparate threads. As with most VCSs, Git calls a
line of development a branch and assigns each branch a name.

Along with branching comes merging. Just as Linus wanted easy branching to
foster alternate lines of development, he also wanted to facilitate easy
merging of those branches. Because branch merging has often been a painful
and difficult operation in version control systems, it would be essential
to support clean, fast, easy merging.

***Complete repositories***

So that individual developers needn’t query a centralized repository server
for historical revision information, it was essential that each repository
have a complete copy of all historical revisions of every file.

***A clean internal design***

Even though end users might not be concerned about a clean internal design,
it was important to Linus and ultimately to other Git developers as well.
Git’s object model has simple structures that capture fundamental concepts
for raw data, directory structure, recording changes, etc. Coupling the
object model with a globally unique identifier technique allowed a very
clean data model that could be managed in a distributed development
environment.

***Be free, as in freedom***

 ’Nuff said.

Given a clean slate to create a new VCS, many talented software engineers
collaborated and Git was born. Necessity was the mother of invention again!

1.2.2        Book Link

https://books.google.es/books?hl=en&lr=&id=aM7-Oxo3qdQC&oi=fnd&pg=PR3&dq=GIT+SVN&ots=39uhIKPlpc&sig=PmxABWMem-h4Fp1-JR-4C2HTwUY&redir_esc=y#v=onepage&q=GIT%20SVN&f=false

Chapter 18: “Using Git with Subversion Repositories”, is of special
interest.

You can find the full book accessible with a basic search in Google:

“Version Control with Git” filetype:pdf

Juan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Community Feedback: Git Repository for R-Devel

Juan Telleria
########################################################################################################################################
# R-devel Archives: “Why R-project source code is not on Github"
[Summary: Aug 2014]
########################################################################################################################################

Key Citations (Pros and Cons) from R-devel Archives

########################################################################################################################################
# GIT PROS
########################################################################################################################################

1. [Simon Urbanek R-devel Aug 21, 2014] Github just makes it much
easier to create and post patches to the project - it has nothing to
do with write access - typically on Github the community has no write
access, either.

2. [Simon Urbanek R-devel Aug 21, 2014] Using pull requests is
certainly much less fragile than e-mails and patches are based on
forked branches, so you can directly build the patched version if you
want without manually applying the patch - and you see the whole
history so you can pick out things logically.

3. [Simon Urbanek R-devel Aug 21, 2014] You can comment on individual
patches to discuss them and even individual commits - often leading to
a quick round trip time of revising it.

4. [Yihui Xie-2 R-devel Aug 22, 2014] Sometimes the patches are not
worth emails back and forth, such as the correction of typos. I cannot
think of anything else that is more efficient than being able to
discuss the patch right in the lines of diff's.

5. [Gaurav Sehrawat R-devel Aug 24, 2014] Bridging gap between web2.0
and web1.0 development methodologies  & thus passing code to younger
generation .

6. [Jeroen Ooms. R-devel Aug 24, 2014] By now all activity of r-base
[1] cran [2] and r-forge [3] is
continuously mirrored on Github, which already gives unprecedented
insight in developments. At least several r-core members [4,5,6,7,8]
have been spotted on Github, and this years useR2014 website [9] was
developed and hosted completely on Github. It seems like a matter of
time until the benefits outweigh the cost of a migration, even to the
more conservative stakeholders.

7. [Spencer Graves-2 R-devel Aug 24, 2014] We could use Git without
Github (Gitlab, …)

8. [Spencer Graves-2 R-devel Aug 24, 2014] It should be easy and cheap
for someone to program a server to make daily backup copies of
whatever we want from Github.  This could provide an insurance policy
in case events push the group to leave Github.

9. [Brian Rowe R-devel Aug 24, 2014] One thing to note about git vs
svn is that each git repository is a complete repository containing
the full history, so despite github acting as a central repository, it
is not the same as a central svn repository. In svn the central
repository is typically the only repository with a complete revision
history, but that is not the case with git.

10. [Simon Urbanek R-devel Aug 25, 2014] There is no point in using
git alone (Github actually supports direct SVN access as well).

11. [Simon Urbanek R-devel Aug 25, 2014] Github: The whole point are
the collaborative features.

########################################################################################################################################
# GIT CONS
########################################################################################################################################

1. [Marc Schwartz R-devel Aug 21, 2014] Since the current SVN based
system works well for them (R Core) and provides restricted write
access that they can control, there is no motivation to move to an
alternative version control system unless they would find it to be
superior for their own development processes.

2. [Jeroen Ooms. R-devel Aug 24, 2014] These things take time

3. [Jeroen Ooms. R-devel Aug 24, 2014] However moving development of a
medium sized, 20 year old open source project is not trivial. You are
dealing with a large commit history and many contributors that all
have to overhaul their familiar tools and development practices
overnight.

4. [Jeroen Ooms. R-devel Aug 24, 2014] There is also the
infrastructure of nightly builds and CRAN r-devel package checking
that relies on the svn.

5. [Jeroen Ooms. R-devel Aug 24, 2014] Moreover moving to Github means
changes in communications, for example replacing the current bug
tracking system to Github "issues".

6. [Jeroen Ooms. R-devel Aug 24, 2014] In addition, several members
are skeptical about putting source code in the hands of a for-profit
US company, and other legal issues.

7. [Jeroen Ooms. R-devel Aug 24, 2014] The most critical piece of
making such a transition is a detailed proposal outlining what the
migration would involve, the cost/benefits, a planning, and someone
that is willing to take the lead. Only on the basis of such a serious
proposal you can have a discussion in which everyone can voice
concerns, be assured that his/her interests are secure, and the idea
can eventually be put up for a vote.

8. [Kirill Müller. Jan 05, 2018 (With Permission)] Migration Technical Problems:
- Keeping monotonous revision numbers seems to be a requirement for
migrating to GitHub.
- It may be more difficult to apply patches produced by "git
format-patch" or "git diff" (obtained from Winston's GitHub mirror) to
an SVN working copy, because patches created by Git are missing the
SVN base revision. (This is an obstacle to adopting GitHub gradually.)

* NOTE: Paper (Previous Transition Experience):
http://iopscience.iop.org/article/10.1088/1742-6596/898/7/072024/pdf

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel