Batch processing and ingestion
Ingest dependencies
When you install the Unstructured Ingest CLI and the
Unstructured Ingest Python library by running the command
pip install unstructured-ingest
by itself, you get the following by default:
- The local source connector and the local destination connector.
- Support for the following file types:
File type |
---|
.bmp |
.eml |
.heic |
.html |
.jpg |
.jpeg |
.tiff |
.png |
.txt |
.xml |
To add support for additional file types, run the following:
Command | File type |
---|---|
pip install "unstructured-ingest[csv]" | .csv |
pip install "unstructured-ingest[doc]" | .doc |
pip install "unstructured-ingest[docx]" | .docx |
pip install "unstructured-ingest[epub]" | .epub |
pip install "unstructured-ingest[md]" | .md |
pip install "unstructured-ingest[msg]" | .msg |
pip install "unstructured-ingest[odt]" | .odt |
pip install "unstructured-ingest[org]" | .org |
pip install "unstructured-ingest[pdf]" | .pdf |
pip install "unstructured-ingest[ppt]" | .ppt |
pip install "unstructured-ingest[pptx]" | .pptx |
pip install "unstructured-ingest[rtf]" | .rtf |
pip install "unstructured-ingest[rst]" | .rst |
pip install "unstructured-ingest[tsv]" | .tsv |
pip install "unstructured-ingest[xlsx]" | .xlsx |
To add support for additional connectors, run the following:
Command | Connector type |
---|---|
pip install "unstructured-ingest[airtable]" | Airtable |
pip install "unstructured-ingest[astra]" | Astra DB |
pip install "unstructured-ingest[azure]" | Azure Blob Storage |
pip install "unstructured-ingest[azure-ai-search]" | Azure AI Search |
pip install "unstructured-ingest[biomed]" | Biomed |
pip install "unstructured-ingest[box]" | Box |
pip install "unstructured-ingest[chroma]" | Chroma |
pip install "unstructured-ingest[clarifai]" | Clarifai |
pip install "unstructured-ingest[confluence]" | Confluence |
pip install "unstructured-ingest[couchbase]" | Couchbase |
pip install "unstructured-ingest[databricks-volumes]" | Databricks Volumes |
pip install "unstructured-ingest[delta-table]" | Delta Tables |
pip install "unstructured-ingest[discord]" | Discord |
pip install "unstructured-ingest[dropbox]" | Dropbox |
pip install "unstructured-ingest[elasticsearch]" | Elasticsearch |
pip install "unstructured-ingest[gcs]" | Google Cloud Storage |
pip install "unstructured-ingest[github]" | GitHub |
pip install "unstructured-ingest[gitlab]" | GitLab |
pip install "unstructured-ingest[google-drive]" | Google Drive |
pip install "unstructured-ingest[hubspot]" | HubSpot |
pip install "unstructured-ingest[jira]" | JIRA |
pip install "unstructured-ingest[kafka]" | Apache Kafka |
pip install "unstructured-ingest[milvus]" | Milvus |
pip install "unstructured-ingest[mongodb]" | MongoDB |
pip install "unstructured-ingest[notion]" | Notion |
pip install "unstructured-ingest[onedrive]" | OneDrive |
pip install "unstructured-ingest[opensearch]" | OpenSearch |
pip install "unstructured-ingest[outlook]" | Outlook |
pip install "unstructured-ingest[pinecone]" | Pinecone |
pip install "unstructured-ingest[postgres]" | PostgreSQL, SQLite |
pip install "unstructured-ingest[qdrant]" | Qdrant |
pip install "unstructured-ingest[reddit]" | |
pip install "unstructured-ingest[s3]" | Amazon S3 |
pip install "unstructured-ingest[sharepoint]" | SharePoint |
pip install "unstructured-ingest[salesforce]" | Salesforce |
pip install "unstructured-ingest[singlestore]" | SingleStore |
pip install "unstructured-ingest[snowflake]" | Snowflake |
pip install "unstructured-ingest[sftp]" | SFTP |
pip install "unstructured-ingest[slack]" | Slack |
pip install "unstructured-ingest[wikipedia]" | Wikipedia |
pip install "unstructured-ingest[weaviate]" | Weaviate |
To add support for available embedding libraries, run the following:
Command | Embedding library type |
---|---|
pip install "unstructured-ingest[bedrock]" | Amazon Bedrock |
pip install "unstructured-ingest[embed-huggingface]" | Hugging Face |
pip install "unstructured-ingest[embed-octoai]" | OctoAI |
pip install "unstructured-ingest[embed-vertexai]" | Google Vertex AI |
pip install "unstructured-ingest[embed-voyageai]" | Voyage AI |
pip install "unstructured-ingest[embed-mixedbreadai]" | Mixedbread |
pip install "unstructured-ingest[openai]" | OpenAI |
pip install "unstructured-ingest[togetherai]" | together.ai |
For details about the specific dependencies that are installed, see:
See also setup.py.
Was this page helpful?