Adeloop Multiple Datasets Guide
Working with multiple datasets in Adeloop
Working with Multiple Datasets in Adeloop
This guide provides step-by-step instructions to work with multiple datasets in Adeloop, including loading, combining, and analyzing data from multiple sources.
Prerequisites
- Have at least 2 datasets uploaded to your database
- Access to the ChartBuilder component
- Datasets should have some common columns for join operations
Dataset Selection UI
Steps:
- Open ChartBuilder
- Create a new cell (or use existing cell)
- Click the Database icon in the cell toolbar
- Verify you can see multiple datasets in the dropdown
- Select 2-3 datasets using checkboxes
- Verify the button text updates to show count (e.g., "3 datasets")
Loading Multiple Datasets in Python
# Check available datasets
show_datasets()
# Load multiple datasets
df1 = pd.read_csv('dataset1.csv')
df2 = pd.read_csv('dataset2.csv')
print(f"Dataset 1: {df1.shape[0]} rows, {df1.shape[1]} columns")
print(f"Dataset 2: {df2.shape[0]} rows, {df2.shape[1]} columns")
# Show column names
print(f"Dataset 1 columns: {df1.columns.tolist()}")
print(f"Dataset 2 columns: {df2.columns.tolist()}")
# Display first dataset in table tab
result = df1Combining Multiple Datasets
Concatenation
# Concatenate datasets vertically (stacking rows)
combined_df = pd.concat([df1, df2], ignore_index=True)
print(f"Combined dataset: {combined_df.shape[0]} rows, {combined_df.shape[1]} columns")
# Display combined dataset
result = combined_dfMerging on Common Columns
# Find common columns
common_cols = set(df1.columns) & set(df2.columns)
print(f"Common columns: {list(common_cols)}")
# Merge datasets if common column exists
if common_cols:
merged_df = pd.merge(df1, df2, on=list(common_cols)[0], how='inner')
print(f"Merged dataset shape: {merged_df.shape}")
# Display merged dataset
result = merged_dfSQL Queries with Multiple Datasets
-- Test basic table access with standardized names
SELECT 'Dataset 1' as source, COUNT(*) as rows FROM dataset1
UNION ALL
SELECT 'Dataset 2' as source, COUNT(*) as rows FROM dataset2;
-- Join using standardized table names
SELECT
d1.employee_id,
d1.name,
d2.salary
FROM dataset1 d1
JOIN dataset2 d2 ON d1.employee_id = d2.employee_id
LIMIT 10;Error Handling
try:
df_nonexistent = pd.read_csv('nonexistent.csv')
except FileNotFoundError as e:
print("✓ Proper error handling for missing dataset")
print(f"Error message: {e}")
# Show what's actually available
show_datasets()Performance Considerations
import time
start_time = time.time()
# Load all datasets
datasets = {}
for i in range(1, 4): # Test with 3 datasets
try:
df = pd.read_csv(f'dataset{i}.csv')
datasets[f'df{i}'] = df
print(f"Loaded dataset{i}: {df.shape}")
except FileNotFoundError:
print(f"Dataset{i} not available")
load_time = time.time() - start_time
print(f"Total loading time: {load_time:.2f} seconds")
# Simple analysis
total_rows = sum(df.shape[0] for df in datasets.values())
print(f"Total rows across all datasets: {total_rows}")
result = f"Loaded {len(datasets)} datasets with {total_rows} total rows in {load_time:.2f}s"Cross-Language Consistency
SELECT COUNT(*) as sql_count FROM employee_data;df = pd.read_csv('employee_data.csv') # Use actual dataset name
python_count = len(df)
print(f"Python count: {python_count}")
result = f"Row count: {python_count}"Troubleshooting Common Issues
Issue: "Table not found" in SQL
Solution: Check dataset names and ensure they're converted properly:
- Spaces → underscores
- Special characters → underscores
- Lowercase conversion
- File extensions removed
Issue: "File not found" in Python
Solution:
- Run
show_datasets()to see available files - Use exact filenames shown
- Check dataset selection in cell
Issue: Join fails in SQL
Solution:
- Verify common columns exist
- Check data types match
- Use
DESCRIBEor sample queries to inspect structure
Issue: Memory errors with large datasets
Solution:
- Use
LIMITin SQL queries - Use
df.head()ordf.sample()in Python - Process datasets in chunks
Success Criteria
All operations should work with:
- ✅ Multiple datasets can be selected and used
- ✅ SQL queries work with proper table names
- ✅ Python can load datasets via
pd.read_csv() - ✅ Joins and merges work correctly
- ✅ Error handling is informative
- ✅ Performance is acceptable
- ✅ Data consistency across languages