markitdown, an open-source tool from Microsoft, enables the conversion of office documents and other formats to Markdown. In 2026, it remains under development, though with a lower update frequency. Check out how to install, configure, and use it in practice – from technical documentation to workflow automation.
What is markitdown and what is it used for?
markitdown is a command-line interface (CLI) tool created by Microsoft that converts files in formats such as DOCX, PPTX, HTML, or PDF to Markdown (`.MD`). The project, released in October 2023, was designed for users needing a quick way to migrate content to a format friendly to version control systems, blogs, or technical documentation. While it is not the first tool of this type (with competitors like Pandoc), it stands out due to its integration with the Microsoft ecosystem and ease of use.
Key features of markitdown include:
- Conversion of headers, lists, tables, and hyperlinks while maintaining structure.
- Image handling (with options to keep or skip them).
- Advanced configuration via a YAML file, allowing for custom conversion rules.
- Integration with code editors such as VS Code, which facilitates documentation workflows.
The tool is particularly useful in scenarios requiring automation – for example, when migrating content from Word to systems like GitHub Wiki or Read The Docs. However, it is not suitable for converting complex documents containing macros, custom styles, or scanned PDFs.
Who created markitdown and what are its origins?
The project was released by Microsoft under the MIT license, allowing for free use and modification of the code. The first version (`v0.1.0`) was released on October 12, 2023, and the last stable update (`v1.2.0`) dates back to March 2025. In 2026, development continues, albeit with a lower frequency of commits – the latest changes in the main branch are from January 2026.
markitdown was created in response to the needs of teams working with technical documentation, where Markdown is the standard. Microsoft, while developing tools like VS Code or GitHub, decided to simplify the conversion of content from traditional office formats to more flexible solutions.
How to install markitdown?
Installing the tool requires Python 3.8 or newer and several additional libraries. Below are the steps for various operating systems.
System requirements
- Python 3.8+ (version 3.10 or newer recommended).
- Python libraries:
python-docx,pypandoc,beautifulsoup4,pdfminer.six. - Operating system: Windows 10/11, macOS 12+, or Linux (Ubuntu 20.04+).
Installation methods
1. Installation via pip (recommended)
To install markitdown, open your terminal and run:
pip install markitdown
If you want to install the latest development version, use:
pip install git+https://github.com/microsoft/markitdown.git
2. Installation from source code
Clone the repository and install the dependencies:
git clone https://github.com/microsoft/markitdown.git
cd markitdown
pip install -e .
3. Using Docker (unofficial)
Although Microsoft does not provide an official Docker image, the community has created unofficial ones. For example:
docker pull username/markitdown
docker run -v $(pwd):/data username/markitdown convert --input plik.docx --output plik.md
Note: These images are not supported by Microsoft, so use them at your own risk.
Verifying the installation
After installation, verify that the tool works by running:
markitdown --version
You should see the version number (e.g., 1.2.0).
How to configure and use markitdown?
markitdown offers both basic and advanced conversion options. Below we discuss the most important features.
Basic usage
To convert a DOCX file to Markdown, use the command:
markitdown convert --input dokument.docx --output dokument.md
Available options:
--output-format: Specifies the output format (md,html,txt). Default:md.--ignore-tables: Skips tables during conversion.--keep-images: Keeps images in the output file (defaultFalse).
Advanced configuration
markitdown allows you to customize conversion rules using a config.yaml file. Example configuration:
styles:
"Heading 1": "#"
"Heading 2": "##"
"Normal": "p"
tables:
enabled: true
max_columns: 10
The configuration file enables, among other things:
- Mapping Word styles to Markdown equivalents.
- Limiting the number of columns in tables.
- Ignoring specific elements (e.g., footnotes).
Full configuration documentation can be found in the project repository.
Integration with other tools
markitdown can be integrated with:
- VS Code: The markitdown for VS Code extension (last update: January 2025) allows for file conversion directly from the editor.
- Jupyter Notebooks: Python scripts enable the conversion of notebooks (`.IPYNB`) to Markdown.
- CI/CD: The tool can be used in GitHub Actions pipelines for automatic documentation conversion on every commit. An example workflow is provided in the documentation.
Limitations and potential issues
Despite its advantages, markitdown has several limitations that are worth knowing before use.
Problematic elements
- PDF: Conversion relies on the
pdfminer.sixlibrary, which may cause layout errors (e.g., broken lines, missing images). - Tables: Simple tables are converted correctly, but complex ones (with merged cells) may be unreadable.
- Macros and OLE objects: Not supported.
- Foreign languages: Diacritical marks are correctly converted but require UTF-8 encoding in source files.
Known bugs
Based on reports in GitHub Issues (as of June 2026), the most common problems are:
- Issue #45: Issues with converting multi-level lists in DOCX.
- Issue #78: Images in PDFs are not preserved in the output Markdown.
- Slow performance with large PDF files (reported in GitHub Discussions).
Comparison with alternatives
markitdown is not the only tool for converting documents to Markdown. Below is a comparison with two popular alternatives:
| Tool | Supported formats | Image preservation | Configuration | Activity (2026) |
|---|---|---|---|---|
| markitdown | DOCX, PPTX, HTML, PDF | Yes (with flag) | Advanced | Maintained |
| Pandoc | 50+ formats | Yes | Very advanced | Active |
| Mammoth.JS | DOCX | No | Basic | Maintained |
Pandoc remains more versatile, but markitdown is easier to use and better integrated with Microsoft tools.
What can markitdown be useful for?
markitdown works well in many scenarios, especially where automation or content migration is required.
Use cases
- Technical documentation: Converting specifications from Word to Markdown for systems like GitHub Wiki or Read The Docs.
- Blogging: Migrating posts from WordPress (HTML) to static site generators (e.g., Hugo, Jekyll).
- Education: Processing teaching materials (PPTX) into formats friendly to e-learning platforms.
- Automation: Integration with CI/CD (e.g., converting documentation on every commit).
User feedback
Users praise markitdown for:
- Simplicity and integration with the Microsoft ecosystem (source: Reddit).
- Ability to customize conversion rules.
Criticism mainly concerns:
- Slow performance with large PDF files.
- Limited support for complex tables.
Development status in 2026
Although markitdown is still being developed, the pace of updates has slowed down. The last stable version (`v1.2.0`) is from March 2025, and the last commit in the main branch is from January 2026. Plans for 2026 include:
- Support for Excel (XLSX) (no specific date).
- Improved PDF handling (better text layout).
The community around the project is active – on GitHub, there are on average 2-3 new threads per month, and on Stack Overflow, you can find about 50 questions with the tag markitdown. However, there is no official Discord or Slack channel.
Best practices and tips
To get the best conversion results, it is worth following a few rules.
Preparing source documents
- Use standard styles (e.g., "Heading 1" instead of manual formatting).
- Avoid macros and OLE objects (not supported).
- For PDFs: Ensure the text is selectable (not a scanned image).
Verifying the output Markdown
After conversion, it is worth checking the file's correctness using:
markdownlint(CLI) – a tool for validating Markdown syntax.- VS Code extensions, e.g., Markdown All in One.
Automating conversion
Example Bash script to convert all DOCX files in a directory:
for file in *.docx; do
markitdown convert --input "$file" --output "${file%.docx}.md"
done
For more advanced scenarios, consider integration with GitHub Actions. Example workflow:
name: Convert DOCX to Markdown
on: [push]
jobs:
convert:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install MarkItDown
run: pip install markitdown
- name: Convert files
run: |
for file in *.docx; do
markitdown convert --input "$file" --output "${file%.docx}.md"
done
- name: Commit changes
run: |
git config --global user.name "GitHub Actions"
git config --global user.email "actions@github.com"
git add .
git commit -m "Automated DOCX to Markdown conversion"
git push
Summary
markitdown is a useful tool for converting office documents to Markdown, especially for users of the Microsoft ecosystem. Although it has its limitations (e.g., issues with PDFs or complex tables), it proves effective in many scenarios – from technical documentation to workflow automation. In 2026, the project is still maintained, albeit with a lower frequency of updates.
If you are looking for a simple and well-integrated tool for converting DOCX or PPTX to Markdown, markitdown is worth considering. For more advanced needs, however, you might want to look at alternatives like Pandoc.
Do you use markitdown in your projects? Share your experiences in the comments!
Sources
- https://github.com/microsoft/markitdown
- https://github.com/microsoft/markitdown/releases
- https://github.com/microsoft/markitdown#limitations
- https://github.com/microsoft/markitdown/blob/main/requirements.txt
- https://github.com/microsoft/markitdown.git
- https://hub.docker.com/r/username/markitdown
- https://github.com/microsoft/markitdown/issues
- https://github.com/microsoft/markitdown/blob/main/docs/configuration.md
- https://marketplace.visualstudio.com/items?itemName=ms-markitdown.markitdown
- https://github.com/microsoft/markitdown/issues/45
- https://github.com/microsoft/markitdown/issues/78
- https://www.reddit.com/r/Python/comments/12x5a6k/markitdown_from_microsoft/
Comments