Comprehensive Guide to Git and GitHub for Version Control and Collaboration
Git and GitHub are essential tools for developers to manage code efficiently and collaborate effectively on projects. This guide provides an overview of Git (local repository) and GitHub (remote repository), their key functionalities, project structuring in PySpark, and practical steps for using Git and GitHub.
Git: A Local Version Control System
-
Version Control
Git helps in versioning the code, making it easy to revert to a previous functioning version if needed. -
Collaboration
Git allows multiple developers to work on a base code, develop features, and merge their work into the final code.
GitHub: A Remote Platform for Collaboration
GitHub is an online platform for hosting, tracking, and collaborating on projects. It acts as a central repository where project teams can store and manage their code.
Basic Project Structure in PySpark
-
Conf Folder
Contains configuration files for different environments like Development, Testing, and Production. -
Lib Folder
Stores reusable and modular code. -
Logger.py
Maintains log-related details. -
Main.py
Acts as the entry point of the application, containing the execution sequence and calls to various functions.
Steps to Create a Project Locally
-
Create Project Folder
cd desktop mkdir project_name cd project_folder
-
Add Main File
vi main.py
-
Create Logger File
vi logger.py
-
Setup Conf Folder
mkdir conf cd conf vi spark.conf vi project.conf
Note: Use IDEs like PyCharm or Visual Studio Code for easier project structuring.
Installing Git
On Mac:
Install Git using Homebrew:
brew install git
On Windows:
Download Git from git-scm.com. Follow the installation wizard and verify the installation with:
git --version
Working with GitHub
-
Create a Remote Repository
Go to GitHub -> Repositories -> New. -
Clone Repository Locally
mkdir mag-project cd mag-project git clone <repository-link>
-
Push Local Changes to GitHub
git init git add . git commit -m "Initial Commit" git remote add origin <URL> git push -u origin main
Important Git Commands
-
Initialize Repository
git init
-
Stage Changes
git add <filename> git add .
-
Commit Changes
git commit -m "Commit message"
-
Push Changes
git push origin main
-
Check Status
git status
-
Branching
- Create a branch:
git branch <branch-name>
- Switch to a branch:
git checkout <branch-name>
- Create a branch:
Handling Merge Conflicts
Conflicts arise during merges when Git cannot identify changes. Resolve conflicts by manually editing files to retain necessary changes and removing unnecessary ones.
Continuous Integration and Deployment (CI/CD)
Every project has a central repository for managing source code. Platforms like GitHub, GitLab, and Bitbucket are widely used for CI/CD practices, enabling seamless development and deployment workflows.
Reverting Changes
-
Unstaged Changes
git restore <file-name>
-
Staged Changes
git restore --staged <file-name> git restore <file-name>
-
Committed Changes
git log git reset <commit-hash>
Labels
Git, GitHub, PySpark project structure, version control, collaboration, Git installation, Git commands, GitHub repository, CI/CD, merge conflicts, branching in Git, reverting changes, logger.py, main.py, project configuration files, local repository, remote repository, IDE for Python projects.