Managing data volume limits

Forum|Forum|1 month ago
June 12, 2026
0 replies
40 views

+2

josephinerohner
Community Manager

This week @yier.wu (Customer Support Specialist) and @frederic.passaniti (Solutions Architect) have broken down the biggest takeaways, limitations, and clever strategies discussed to keep your data publishing seamless.

"Volume limits should never be an obstacle to publishing your datasets." — @Nicolas Terpolilli , Head of Product Delivery

📊 Understanding the limits

The platform enforces three distinct types of technical limits based on how your datasets are ingested:

1. Local file uploads (240 MB)

The limit: applies to individual source files (CSV, JSON, XLSX, etc.) uploaded directly from your computer.
Pro-tip: a single dataset can aggregate multiple sources, meaning you can combine multiple files that are each under 240 MB.

2. Remote connections (500 MB)

The limit: applies to remote server data fetches via URLs, APIs, or data warehouse connectors (Snowflake, Databricks, Google Drive, etc.).
Watch out: if an API response exceeds 500 MB, you'll receive a transfer error. This can sometimes trigger connection timeouts depending on network speeds or API configurations.

3. Structural column limits (500 total columns)

Even if your file size is small, Huwise caps total columns at 500 per dataset, with specific sub-caps per data type:

Max 150 Text/Number columns
Max 100 Date/Datetime columns
Max 50 Geographic columns

🛠️ Strategies to bypass volume limits

Before diving into a technical fix, the team recommends asking three strategic questions: What kind of data is it? How will users interact with it? Do you have control over the source API formatting?

Based on your answers, here are the best ways to optimize your workflow:

🗜️ Compress or split local files

Text-heavy CSVs: files consisting of 90% text can reduce in size by 60% to 90% when zipped. Huwise is fully capable of natively unzipping and indexing standard .zip, .gz, and .gzip archives upon upload.
Divide and conquer: Break your large datasets into smaller chunks under 240 MB using free automated tools like csvsplit. Huwise will automatically merge them back into a single dataset.

🌐 Leverage a free Huwise FTP server

If zipping isn't enough, Huwise offers a free FTP hosting server to its clients.

Moving files to FTP shifts them into a "remote connection," bypassing local upload limits.
Allows you to schedule automatic, hands-free dataset updates from the back office.
Supports incremental updates, meaning only newly added monthly/daily files are indexed—saving hours of processing time on massive historical data.

🧮 Data aggregation (Changing scales)

Do your users really need raw data? For instance, the French National Health Insurance (CNAM) dashboard consolidates over 1.5 billion rows of medical records down to 300,000 highly exploitable rows by pre-aggregating raw data into regional and departmental metrics. This satisfies open-data privacy rules while drastically shrinking file sizes.

📦 Column fusion (JSON objects)

If you're hitting the text column limit, you can consolidate hundreds of secondary technical columns into a single column structured as a JSON object. While you lose standard table filtering on those specific keys, it effectively bypasses the 150-text-column ceiling.

🌟 The game changer: zero-copy data sharing

For enterprise environments using Snowflake or Databricks (newly added in Q1 2026), Huwise has introduced Zero-Copy Data Sharing.

Instead of duplicating and indexing massive data warehouses natively onto Huwise, this feature creates a virtual map. Your data stays securely stored in its original warehouse, but your external users can query, search, view tables, and filter it smoothly through the clean Huwise portal interface without consuming any of your platform storage quotas!

📚 Looking for more?

The team has published a variety of step-by-step guides in the Huwise Academy covering:

Native split/compression workflows
Setting up your dedicated FTP account
Implementing Zero-Copy warehouse connectors

Have a particularly unique or complex enterprise dataset? Tell us more in the comments ! 👇