Working with Multiple Datasets in Adeloop

This guide provides step-by-step instructions to work with multiple datasets in Adeloop, including loading, combining, and analyzing data from multiple sources.

Prerequisites

Have at least 2 datasets uploaded to your database
Access to the ChartBuilder component
Datasets should have some common columns for join operations

Dataset Selection UI

Steps:

Open ChartBuilder
Create a new cell (or use existing cell)
Click the Database icon in the cell toolbar
Verify you can see multiple datasets in the dropdown
Select 2-3 datasets using checkboxes
Verify the button text updates to show count (e.g., "3 datasets")

Loading Multiple Datasets in Python

# Check available datasets
show_datasets()

# Load multiple datasets
df1 = pd.read_csv('dataset1.csv')
df2 = pd.read_csv('dataset2.csv')

print(f"Dataset 1: {df1.shape[0]} rows, {df1.shape[1]} columns")
print(f"Dataset 2: {df2.shape[0]} rows, {df2.shape[1]} columns")

# Show column names
print(f"Dataset 1 columns: {df1.columns.tolist()}")
print(f"Dataset 2 columns: {df2.columns.tolist()}")

# Display first dataset in table tab
result = df1

Combining Multiple Datasets

Concatenation

# Concatenate datasets vertically (stacking rows)
combined_df = pd.concat([df1, df2], ignore_index=True)
print(f"Combined dataset: {combined_df.shape[0]} rows, {combined_df.shape[1]} columns")

# Display combined dataset
result = combined_df

Merging on Common Columns

# Find common columns
common_cols = set(df1.columns) & set(df2.columns)
print(f"Common columns: {list(common_cols)}")

# Merge datasets if common column exists
if common_cols:
    merged_df = pd.merge(df1, df2, on=list(common_cols)[0], how='inner')
    print(f"Merged dataset shape: {merged_df.shape}")
    
    # Display merged dataset
    result = merged_df

SQL Queries with Multiple Datasets

-- Test basic table access with standardized names
SELECT 'Dataset 1' as source, COUNT(*) as rows FROM dataset1
UNION ALL
SELECT 'Dataset 2' as source, COUNT(*) as rows FROM dataset2;

-- Join using standardized table names
SELECT
  d1.employee_id,
  d1.name,
  d2.salary
FROM dataset1 d1
JOIN dataset2 d2 ON d1.employee_id = d2.employee_id
LIMIT 10;

Error Handling

try:
    df_nonexistent = pd.read_csv('nonexistent.csv')
except FileNotFoundError as e:
    print("✓ Proper error handling for missing dataset")
    print(f"Error message: {e}")
    
# Show what's actually available
show_datasets()

Performance Considerations

import time

start_time = time.time()

# Load all datasets
datasets = {}
for i in range(1, 4):  # Test with 3 datasets
    try:
        df = pd.read_csv(f'dataset{i}.csv')
        datasets[f'df{i}'] = df
        print(f"Loaded dataset{i}: {df.shape}")
    except FileNotFoundError:
        print(f"Dataset{i} not available")

load_time = time.time() - start_time
print(f"Total loading time: {load_time:.2f} seconds")

# Simple analysis
total_rows = sum(df.shape[0] for df in datasets.values())
print(f"Total rows across all datasets: {total_rows}")

result = f"Loaded {len(datasets)} datasets with {total_rows} total rows in {load_time:.2f}s"

Cross-Language Consistency

SELECT COUNT(*) as sql_count FROM employee_data;

df = pd.read_csv('employee_data.csv')  # Use actual dataset name
python_count = len(df)
print(f"Python count: {python_count}")
result = f"Row count: {python_count}"