Pages

Showing posts with label sharepoint. Show all posts
Showing posts with label sharepoint. Show all posts

Saturday, January 6, 2024

Sharepoint Online Backup with Python

When it comes to managing documents and files, SharePoint is a robust platform used by organizations worldwide. However, as the scale of data grows, manually handling the download and organization of files from multiple SharePoint sites can become a daunting task. Automating this process not only saves time but also reduces the chances of human error and ensures consistency in how files are managed and archived.

we will explore a Python script specifically designed to streamline the process of downloading files and folders from multiple SharePoint sites. This script automates the authentication, downloading, logging, and archiving process. Whether you're a system administrator managing company documents, a data analyst gathering data for reports, or a developer looking to integrate SharePoint files into your projects, this script is designed to simplify your workflow and can be customized to fit your specific needs. So, let's dive into the details and functionalities of this script to understand how it can transform your interaction with SharePoint data.

Complete Source code can be checked out from following repo

https://github.com/rahulrajvn/python-sharepoint-backup

This script is designed to automate the process of downloading files and folders from multiple SharePoint sites, logging the process, and packaging the downloaded content into tar files. Here's a detailed breakdown of its components and functionalities:\

1. Import Statements:

import os
import logging
import shutil
import tarfile
from datetime import datetime
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
os, logging, shutil, tarfile: Python standard libraries for operating system interaction, logging, high-level file operations, and tar archive manipulation.
datetime: To work with dates and times.
AuthenticationContext, ClientContext, File: From office365 package, to authenticate and interact with SharePoint.

2. Logging Configuration:


# Configure global logging
base_local_log_directory = "E:/Download/logs/" ## Enter the path in which logs needs to be saved.
# Ensure the log directory exists
if not os.path.exists(base_local_log_directory):
    os.makedirs(base_local_log_directory)
Sets up a directory to store log files and ensures it exists.

3. Logger Setup Function:

def setup_logger(site_name):
    log_file = f"{base_local_log_directory}sharepoint_downloads_{site_name}_{current_time}.log"
    logger = logging.getLogger(site_name)
    logger.setLevel(logging.INFO)
    file_handler = logging.FileHandler(log_file)
    formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
    file_handler.setFormatter(formatter)
    logger.addHandler(file_handler)
    return logger, file_handler

A function to configure and return a logger for each SharePoint site. It creates a log file specific to the site and the current time.

4. SharePoint Site Details:

# List of SharePoint sites
sharepoint_sites = [
{
"site_url": "https://<sharepoint-name>.sharepoint.com/sites/first-site",
"site_base_url": "/sites/first-site/Shared Documents",
"client_id": "<Application Client ID>",
"client_secret": "<Client Secrect Value>"
},
{
"site_url": "https://<sharepoint-name>.sharepoint.com/sites/second-site",
"site_base_url": "/sites/second-site/Shared Documents",
"client_id": "<Application Client ID>",
"client_secret": "<Client Secrect Value>"
},
# Add more SharePoint sites as needed
]

A list of dictionaries, each containing the details necessary to access different SharePoint sites.

5. Download Path Configuration:

# Local path for downloads
base_local_download_path = "E:/Download/data/" # Path to which backups needs to be taken
# Ensure the directory exists
if not os.path.exists(base_local_download_path):
    os.makedirs(base_local_download_path)

Sets up a local directory to store the downloaded files and ensures it exists.

6. File Download Function:

# Function to download a file from SharePoint
def download_file(ctx, file_url, local_path):
    response = File.open_binary(ctx, file_url)
    with open(local_path, "wb") as local_file:
        local_file.write(response.content)

Downloads a single file from SharePoint using the provided context (authentication and site details).

7. Recursive Download Function:

# Function to list and download all files and folders from a given folder
def list_and_download_files_and_folders(url, folder_url, local_folder_path,client_id, client_secret):
    # Extract the site name from the URL
    site_name = url.split('/')[-1]
    
    context_auth = AuthenticationContext(url)
    if not context_auth.acquire_token_for_app(client_id, client_secret):
        logger.error(f"Authentication failed for {site_name}")
        return
    ctx = ClientContext(url, context_auth)
    web = ctx.web
    folder = web.get_folder_by_server_relative_url(folder_url)
    ctx.load(folder)
    ctx.execute_query()
    print(f"Accessing Folder: {folder.properties['ServerRelativeUrl']}")
    
    # Ensure local folder exists
    if not os.path.exists(local_folder_path):
        os.makedirs(local_folder_path)

    # List and download files in the folder
    files = folder.files
    ctx.load(files)
    ctx.execute_query()
    for file in files:
        file_name = file.properties['Name']
        file_url = file.properties['ServerRelativeUrl']
        print(f"Downloading File: {file_name}")
        download_file(ctx, file_url, os.path.join(local_folder_path, file_name))

    # List folders in the folder and recursively list and download
    folders = folder.folders
    ctx.load(folders)
    ctx.execute_query()
    for folder in folders:
        folder_name = folder.properties['Name']
        folder_url = folder.properties['ServerRelativeUrl']
        print(f"Accessing Folder: {folder_name}")
        list_and_download_files_and_folders(url, folder_url, os.path.join(local_folder_path, folder_name),client_id, client_secret)

Recursively lists and downloads all files and folders from a given SharePoint folder. It uses authentication details to access the site and logs the progress.

8. Archiving Function:

# Function to tar a directory
def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))
    logging.info(f"Created tar archive {output_filename}")

Creates a tar.gz archive of the downloaded directory.

9. Site Processing Function:

    
def process_site(site_details):
    site_url = site_details["site_url"]
    site_base_url = site_details["site_base_url"]
    client_id = site_details["client_id"]
    client_secret = site_details["client_secret"]
    # Extract the site name from the URL
    site_name = site_url.split('/')[-1]

    # Local path for this site's downloads
    local_download_path = f"{base_local_download_path}{site_name}_{current_time}"
    
    logger, file_handler = setup_logger(site_name)
    
    # Log the start of the process for this site
    logger.info(f"Starting download script for {site_name}")

    # Download files and folders
    list_and_download_files_and_folders(site_url, site_base_url, local_download_path, client_id, client_secret)

    # Tar the downloaded directory
    tar_filename = f"{local_download_path}.tar.gz"
    make_tarfile(tar_filename, local_download_path)

    # Log the end of the process for this site
    logger.info(f"Download script finished for {site_name}")

    # Remove the directory after making the tar file
    try:
        shutil.rmtree(local_download_path)
        logger.info(f"Successfully removed directory: {local_download_path}")
    except Exception as e:
        logger.error(f"Error removing directory {local_download_path}: {e}")

    
    # Close the file handler to properly close the log file
    file_handler.close()

Orchestrates the download and archiving process for a single SharePoint site. It sets up logging, downloads content, creates a tar archive of the content, and then cleans up the download directory.

10. Main Loop:


for site in sharepoint_sites:
    process_site(site)
Iterates through each SharePoint site and processes it using the process_site function.

Generalization:


To generalize this script for broader use:Configuration: Externalize the configuration (like SharePoint site details, local paths) to a config file or environment variables.
Error Handling: Enhance error handling to manage and retry after transient errors.
Logging: Include more detailed logging and potentially different log levels based on the environment (e.g., DEBUG in development, INFO in production).
Authentication: Support different authentication methods if needed.
Modularity: Break down the script into more modular functions or classes, making it easier to update parts of the logic.
User Feedback: Provide real-time feedback or a progress bar when running the script, especially for long downloads.

Each section of this script is designed with a specific purpose for interacting with SharePoint, handling files, and logging the process. By understanding and potentially generalizing these components, one can adapt the script for various environments and SharePoint structures.

Securing API Access to SharePoint Online: A Guide to Azure Application Registration and SharePoint App Creation

Gaining API access to a SharePoint environment is essential for developers looking to build applications that interact with SharePoint data. This can be achieved in two primary ways: through Azure Application registration or by creating an application directly in SharePoint. Each method has its own set of steps and considerations. In this blog, we will explore both methods and guide you through the process of setting up and securing API access to your SharePoint environment.

Method 1: Azure Application Registration

Azure Active Directory (Azure AD) offers a secure way for applications to access SharePoint through Azure Application registration. Here's how you can set it up:

  1. Navigate to Azure Portal: Start by going to the Azure portal and creating a new application registration. This will represent your application in the directory and will be the basis for its authentication and authorization processes.

  2. Configure Application Settings: Assign a name to your application and configure settings such as supported account types and redirect URIs. These settings will determine how your application interacts with users and other services.

  3. Set API Permissions: The most crucial step is setting the API permissions for your application. Navigate to the "API permissions" section and add permissions for SharePoint. For read-only access, you will typically add permissions such as Sites.Read.All or Files.Read.All. Ensure you understand the scope and implications of each permission you grant.

Method 2: Creating an Application in SharePoint

For those who prefer to work directly within the SharePoint environment, creating an application in SharePoint is a viable alternative. Here's how to do it:

  1. Access SharePoint Admin Center: Log into your SharePoint admin center and navigate to the 'appregnew.aspx' page (https://<main-site name>.sharepoint.com/_layouts/15/appregnew.aspx ). This is where you'll register your new application. This is needed only if the Azure AD APPlication is not created.

  1. Generate Client ID & Secret: Click on the "Generate" button to create a new client ID and secret. These credentials will be used to authenticate your application with SharePoint.

Once the Application is created through Method 1 or Method 2 next step is to provide right access to the same. 

Next step is granting permissions to the newly created principal. Since we're granting tenant scoped permissions this granting can only be done via the appinv.aspx page on the tenant administration site. You can reach this site via https://sitename-admin.sharepoint.com/_layouts/15/appinv.aspx

Once the page is loaded add your client id and look up the created principal:



Set Permissions: Define the permissions your application will need by providing the permission XML. For full control permissions, you might use something like:

<AppPermissionRequests AllowAppOnlyPolicy="true"> <AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" /> </AppPermissionRequests>

When you click on Create you'll be presented with a permission consent dialog. Press Trust It to grant the permissions:


Best Practices and Considerations

  • Security: Regardless of the method you choose, security should be your top priority. Treat your client ID and secret as you would any sensitive credentials. Ensure that only authorized personnel have access to this information and that it's stored securely.

  • Permission Scope: Always adhere to the principle of least privilege. Grant only the permissions necessary for your application to function. Excessive permissions can pose a security risk.

  • Maintenance: Regularly review and update your application's settings and permissions to accommodate changes in your environment or application requirements.

  • Documentation: Both Microsoft's Azure documentation and SharePoint documentation are excellent resources. Refer to these regularly to stay updated on best practices and new features.

By following these steps and considerations, you can successfully set up API access to your SharePoint environment, whether through Azure Application registration or directly within SharePoint. Each method has its own benefits and the best choice depends on your specific needs and environment. Always prioritize security and stay informed on best practices to ensure a successful and safe integration.