MarkItDown: How to convert documents to Markdown with Microsoft's tool in 2026

MarGib July 01, 2026
🌐 🇵🇱 Polski · 🇬🇧 EN

markitdown, an open-source tool from Microsoft, enables the conversion of office documents and other formats to Markdown. In 2026, it remains under development, though with a lower update frequency. Check out how to install, configure, and use it in practice – from technical documentation to workflow automation.

Ilustracja konwersji dokumentu Word na Markdown za pomocą narzędzia MarkItDown, z widokiem terminala i edytora kodu.
Converting documents to Markdown using MarkItDown – process visualization.

What is markitdown and what is it used for?

markitdown is a command-line interface (CLI) tool created by Microsoft that converts files in formats such as DOCX, PPTX, HTML, or PDF to Markdown (`.MD`). The project, released in October 2023, was designed for users needing a quick way to migrate content to a format friendly to version control systems, blogs, or technical documentation. While it is not the first tool of this type (with competitors like Pandoc), it stands out due to its integration with the Microsoft ecosystem and ease of use.

Key features of markitdown include:

  • Conversion of headers, lists, tables, and hyperlinks while maintaining structure.
  • Image handling (with options to keep or skip them).
  • Advanced configuration via a YAML file, allowing for custom conversion rules.
  • Integration with code editors such as VS Code, which facilitates documentation workflows.

The tool is particularly useful in scenarios requiring automation – for example, when migrating content from Word to systems like GitHub Wiki or Read The Docs. However, it is not suitable for converting complex documents containing macros, custom styles, or scanned PDFs.

Who created markitdown and what are its origins?

The project was released by Microsoft under the MIT license, allowing for free use and modification of the code. The first version (`v0.1.0`) was released on October 12, 2023, and the last stable update (`v1.2.0`) dates back to March 2025. In 2026, development continues, albeit with a lower frequency of commits – the latest changes in the main branch are from January 2026.

markitdown was created in response to the needs of teams working with technical documentation, where Markdown is the standard. Microsoft, while developing tools like VS Code or GitHub, decided to simplify the conversion of content from traditional office formats to more flexible solutions.

How to install markitdown?

Installing the tool requires Python 3.8 or newer and several additional libraries. Below are the steps for various operating systems.

System requirements

  • Python 3.8+ (version 3.10 or newer recommended).
  • Python libraries: python-docx, pypandoc, beautifulsoup4, pdfminer.six.
  • Operating system: Windows 10/11, macOS 12+, or Linux (Ubuntu 20.04+).

Installation methods

1. Installation via pip (recommended)

To install markitdown, open your terminal and run:

pip install markitdown

If you want to install the latest development version, use:

pip install git+https://github.com/microsoft/markitdown.git

2. Installation from source code

Clone the repository and install the dependencies:

git clone https://github.com/microsoft/markitdown.git
cd markitdown
pip install -e .

3. Using Docker (unofficial)

Although Microsoft does not provide an official Docker image, the community has created unofficial ones. For example:

docker pull username/markitdown
docker run -v $(pwd):/data username/markitdown convert --input plik.docx --output plik.md

Note: These images are not supported by Microsoft, so use them at your own risk.

Verifying the installation

After installation, verify that the tool works by running:

markitdown --version

You should see the version number (e.g., 1.2.0).

How to configure and use markitdown?

markitdown offers both basic and advanced conversion options. Below we discuss the most important features.

Basic usage

To convert a DOCX file to Markdown, use the command:

markitdown convert --input dokument.docx --output dokument.md

Available options:

  • --output-format: Specifies the output format (md, html, txt). Default: md.
  • --ignore-tables: Skips tables during conversion.
  • --keep-images: Keeps images in the output file (default False).

Advanced configuration

markitdown allows you to customize conversion rules using a config.yaml file. Example configuration:

styles:
  "Heading 1": "#"
  "Heading 2": "##"
  "Normal": "p"
tables:
  enabled: true
  max_columns: 10

The configuration file enables, among other things:

  • Mapping Word styles to Markdown equivalents.
  • Limiting the number of columns in tables.
  • Ignoring specific elements (e.g., footnotes).

Full configuration documentation can be found in the project repository.

Integration with other tools

markitdown can be integrated with:

  • VS Code: The markitdown for VS Code extension (last update: January 2025) allows for file conversion directly from the editor.
  • Jupyter Notebooks: Python scripts enable the conversion of notebooks (`.IPYNB`) to Markdown.
  • CI/CD: The tool can be used in GitHub Actions pipelines for automatic documentation conversion on every commit. An example workflow is provided in the documentation.

Limitations and potential issues

Despite its advantages, markitdown has several limitations that are worth knowing before use.

Problematic elements

  • PDF: Conversion relies on the pdfminer.six library, which may cause layout errors (e.g., broken lines, missing images).
  • Tables: Simple tables are converted correctly, but complex ones (with merged cells) may be unreadable.
  • Macros and OLE objects: Not supported.
  • Foreign languages: Diacritical marks are correctly converted but require UTF-8 encoding in source files.

Known bugs

Based on reports in GitHub Issues (as of June 2026), the most common problems are:

  • Issue #45: Issues with converting multi-level lists in DOCX.
  • Issue #78: Images in PDFs are not preserved in the output Markdown.
  • Slow performance with large PDF files (reported in GitHub Discussions).

Comparison with alternatives

markitdown is not the only tool for converting documents to Markdown. Below is a comparison with two popular alternatives:

Tool Supported formats Image preservation Configuration Activity (2026)
markitdown DOCX, PPTX, HTML, PDF Yes (with flag) Advanced Maintained
Pandoc 50+ formats Yes Very advanced Active
Mammoth.JS DOCX No Basic Maintained

Pandoc remains more versatile, but markitdown is easier to use and better integrated with Microsoft tools.

What can markitdown be useful for?

markitdown works well in many scenarios, especially where automation or content migration is required.

Use cases

  1. Technical documentation: Converting specifications from Word to Markdown for systems like GitHub Wiki or Read The Docs.
  2. Blogging: Migrating posts from WordPress (HTML) to static site generators (e.g., Hugo, Jekyll).
  3. Education: Processing teaching materials (PPTX) into formats friendly to e-learning platforms.
  4. Automation: Integration with CI/CD (e.g., converting documentation on every commit).

User feedback

Users praise markitdown for:

  • Simplicity and integration with the Microsoft ecosystem (source: Reddit).
  • Ability to customize conversion rules.

Criticism mainly concerns:

  • Slow performance with large PDF files.
  • Limited support for complex tables.

Development status in 2026

Although markitdown is still being developed, the pace of updates has slowed down. The last stable version (`v1.2.0`) is from March 2025, and the last commit in the main branch is from January 2026. Plans for 2026 include:

  • Support for Excel (XLSX) (no specific date).
  • Improved PDF handling (better text layout).

The community around the project is active – on GitHub, there are on average 2-3 new threads per month, and on Stack Overflow, you can find about 50 questions with the tag markitdown. However, there is no official Discord or Slack channel.

Best practices and tips

To get the best conversion results, it is worth following a few rules.

Preparing source documents

  • Use standard styles (e.g., "Heading 1" instead of manual formatting).
  • Avoid macros and OLE objects (not supported).
  • For PDFs: Ensure the text is selectable (not a scanned image).

Verifying the output Markdown

After conversion, it is worth checking the file's correctness using:

  • markdownlint (CLI) – a tool for validating Markdown syntax.
  • VS Code extensions, e.g., Markdown All in One.

Automating conversion

Example Bash script to convert all DOCX files in a directory:

for file in *.docx; do
  markitdown convert --input "$file" --output "${file%.docx}.md"
done

For more advanced scenarios, consider integration with GitHub Actions. Example workflow:

name: Convert DOCX to Markdown
on: [push]
jobs:
  convert:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - name: Install MarkItDown
        run: pip install markitdown
      - name: Convert files
        run: |
          for file in *.docx; do
            markitdown convert --input "$file" --output "${file%.docx}.md"
          done
      - name: Commit changes
        run: |
          git config --global user.name "GitHub Actions"
          git config --global user.email "actions@github.com"
          git add .
          git commit -m "Automated DOCX to Markdown conversion"
          git push

Summary

markitdown is a useful tool for converting office documents to Markdown, especially for users of the Microsoft ecosystem. Although it has its limitations (e.g., issues with PDFs or complex tables), it proves effective in many scenarios – from technical documentation to workflow automation. In 2026, the project is still maintained, albeit with a lower frequency of updates.

If you are looking for a simple and well-integrated tool for converting DOCX or PPTX to Markdown, markitdown is worth considering. For more advanced needs, however, you might want to look at alternatives like Pandoc.

Do you use markitdown in your projects? Share your experiences in the comments!

Sources

Facebook X E-mail

Comments

Dodaj komentarz

Explore

Labels

artificial intelligence 15 news 11 Windows 10 browsers 10 Opera 9 Security 9 Automation 8 Technology 8 facebook 8 web applications 8 Software 7 automation 7 chrome 7 coaching 7 curiosities 7 technology 7 www 7 Docker 6 Microsoft 6 Mind 6 Programming 6 Web browser 6 entertainment 6 new technologies 6 Anthropic 5 Cybersecurity 5 God 5 LLM 5 Productivity 5 Red Hat 5 books 5 CentOS 4 Open Source 4 OpenAI 4 RedHat 4 Ubuntu 4 Vivaldi 4 Windows 10 4 Windows system administration 4 applications 4 containers 4 education 4 health 4 machine learning 4 people 4 photography 4 trivia 4 Administration 3 Android 3 BIG DATA 3 Business 3 Claude 3 Claude AI 3 FAQ 3 FIFA 3 Firefox 3 Google projects 3 Homelab 3 Local AI 3 Personal Development 3 Personal Finance 3 Privacy 3 Programs 3 algorithms 3 bash 3 communication 3 computer science 3 cybersecurity 3 extensions 3 faith 3 future of work 3 games 3 good movie 3 help 3 human 3 interesting websites 3 interface 3 media 3 money 3 n8n 3 network 3 opensource 3 personal competencies 3 personal development 3 programming 3 psychology 3 reading 3 religion 3 security 3 system administration 3 tools 3 virtualization 3 web browser 3 websites 3 AI agents 2 AI assistant 2 Asus 2 Career 2 Centos 2 ChatGPT 2 Cloud 2 Codex 2 Configuration 2 Debian 2 Debugging 2 DevOps 2 Docker Machine 2 Drones 2 Education 2 Free Red Hat 2 Hardware 2 Intel 2 Intelligence 2 Japan 2 Job Market 2 Kernel 2 Machine Learning 2 Medicine 2 Mythos 2 Netflix 2 Performance 2 Psychology 2 RHEL7 2 RSS 2 Rocky Linux 2 Sakana AI 2 Self-hosting 2 Servers 2 Software Engineering 2 Windows administration 2 Windows errors 2 ansible 2 better life 2 brain 2 chat 2 children 2 cloud storage 2 communicator 2 communities 2 computer intelligence 2 computers 2 conferences 2 creativity 2 curl 2 cyberattacks 2 data 2 death 2 documentary 2 earning 2 emotions 2 file storage 2 fix 2 free application 2 free courses 2 free knowledge from the internet 2 free training 2 genius 2 hacker 2 investments 2 knowledge 2 learning 2 local AI 2 mind manipulation 2 mind programming 2 mindfulness 2 mobile 2 mobile apps 2 mobile phones 2 motivation 2 movie 2 multimedia 2 open-source 2 personal thoughts 2 photos 2 plugin 2 podcast 2 privacy 2 prompt 2 shell 2 software 2 technological innovations 2 terminal 2 torrent 2 trick 2 wealth 2 weather 2 web 2 wisdom 2 youtube 2 (Treści etykiet nie zostały podane w treści wejściowej) 1 120B models 1 21st Century Skills 1 2FA 1 2nm processors 1 64 bit 1 7 1 ACT therapy 1 AGI 1 AI Agents 1 AI Frameworks 1 AI History 1 AI Safety 1 AI benchmarks 1 AI censorship 1 AI ethics 1 AI future 1 AI governance 1 AI in healthcare 1 AI in sports 1 AI optimization 1 AI safety 1 AI superchips 1 AIMP 1 AMD ROCm 1 Acquisition 1 Alan Watts 1 Alexander Gerst 1 AlmaLinux 1 Alpine Linux 1 Andrej Karpathy 1 Anonymous 1 Apache 1 Apple 1 Apple 2025 1 Apple Silicon 1 Aria AI 1 Audacity 4 1 AutoGen 1 Banking 1 Bash 1 Big Data 1 Bill Warner 1 Biotechnology 1 Black Mirror 1 Blackwell B100 1 Blockchain 1 Bonding 1 Bono 1 Business and Finance 1 C++ 1 CPU 1 CUA 1 CUDA 1 Career Development 1 Chat GPT 1 Chemtrails 1 ChildOnlineSafety 1 Claude Fable 1 Coaching 1 Computer-Using Agent 1 Constitutional AI 1 Copilot 1 Copilot for Finance 1 Couching 1 CrewAI 1 Cryptocurrencies 1 Cyberbullying 1 Dario Amodei 1 Darwin 1 Data Science 1 Deep Learning 1 DeepSeek 1 Deepseek 1 Deluge 1 Devin AI 1 Diagnostics 1 Digitalization 1 Docker containers 1 Drivers 1 Dystrybucje 1 EA GAMES 1 EA SPORTS 1 Earth AI 1 Economics 1 Email 1 Emigration 1 Enterprise Linux 1 Entrepreneurship 1 Error 1 European Funds 1 Excel 1 FIFA 16 1 Fable 1 Fact-checking 1 Fake News 1 Flannel 1 Flynn Effect 1 Football 1 Foundation 1 Free 1 Free Software 1 Free software 1 Fugu Ultra 1 Future 1 Future of Finance 1 Future of Work 1 GDPR 1 GLM-5.2 1 GPT 1 GPT-4 1 GPT-4.5 1 GPU Cloud 1 GUI 1 Gemini 1 Generation Z 1 GitHub 1 Golden Gate 1 Google Assistant 1 Google Gemma 4 12B 1 Google Research 1 Google activity 1 GoogleFamilyLink 1 Got Talent 1 Gregory Kurtzer 1 Guide 1 Guides 1 HTML 1 Hardware Requirements 1 Health Intelligence 1 Hygge 1 IAM 1 IBM 1 IDE 1 IQ 1 ISIS 1 ISS 1 IT 1 IT history 1 Intelligent email 1 Internet Browser 1 Internet browser 1 InternetEducation 1 Interview 1 Islam 1 Islamic State 1 Jacquard 1 JavaScript 1 Jboss 1 Jetson Thor price 1 Joel Pearson 1 Kali Linux 1 Khan Academy 1 Kylian Mbappé 1 LLM Deployment 1 Labor Market 1 Legal regulations 1 LibreOffice 1 Linux diagnostics 1 Logs 1 Londoners 1 MFA 1 MLX 1 Maps 1 MarGib_Film 1 Marek Jankowski 1 Mars helicopter 1 Material Design 1 Matt Pocock 1 Microsoft 365 1 Military 1 Mindfulness 1 Miłosz Brzeziński 1 MrBallen 1 My take 1 NTFS 1 NVIDIA 1 NVIDIA Blackwell 1 NVIDIA Jetson Thor 1 National security 1 Navy SEALs 1 Neural Networks 1 New 1 Nginx 1 No comment 1 Node.js 1 Non-profit 1 Notion 1 Nvidia 1 Odysseus 1 Opera Air 1 Opera Neon 1 Opera Touch 1 P2P 1 PARP 1 Pac-Man 1 Pekao S.A 1 Peperclips 1 Perceptron 1 Personal development 1 Philosophy 1 Photoshop 1 Poland 1 Poles 1 PostgreSQL 1 PowerShell 1 Project TANGO 1 Proton Drive 1 Puppeteer 1 PyTorch 1 Python 1 Qt Creator 1 Quotes 1 RHEL8 1 Raspberry PI 1 Raspberry Pi 1 Raspbian 1 Red Hat 8 1 Red Hat Enterprise Linux Developer Suite 1 RedHat 8 1 Regex 1 Robo-advisors 1 Rust 1 SMEs 1 SUSE 1 SafeInternet 1 SaferInternetDay 1 Safety 1 Sakana Fugu 1 Search 1 Sector 3.0 Festival 1 Security Auditing 1 September 23 2017 1 Server Administration 1 Smart City 1 Snip. 1 Social Media 1 Soli 1 Solo Projects 1 Solopreneurship 1 Something from myself 1 Sound 1 Sovereign AI 1 Sport 1 Steam Deck 1 SysAdmin 1 System Administration 1 Tech 1 TensorFlow 1 The Shack 1 Time Management 1 Tips 1 Tokenomics 1 Tools 1 Tribler 1 Tutorial 1 U.S. government 1 U2 1 USB 1 Ubuntu 26.04 1 Ubuntu Server 1 VentuSky 1 VirtualBox 1 Virtualization 1 WBC 1 WSL 3 1 WWDC 2026 1 WWDC26 1 Warsaw 1 Weave 1 Web Scraping 1 Websites 1 Windows update 1 Work 1 Workflow 1 World Cup 1 World Cup 2026 1 World Wide Web 1 X-Files 1 X-files 1 YouTube 1 ZUS 1 ZenFone 1 a drop of motivation 1 about this blog 1 account security 1 achieving goals 1 ad blocking 1 addiction 1 administrator 1 aids 1 animations 1 assertiveness 1 audio 1 audio editing 1 automateit 1 autonomous cars 1 awareness 1 bank 1 bash on windows 1 bat files 1 batch 1 battery 1 beliefs 1 beta 1 better living 1 better quality 1 bin/bash 1 biodiversity 1 blocking 1 blogger 1 body language 1 bookmarks 1 boot 1 bootable usb 1 boxing 1 brain-computer interfaces 1 business intelligence 1 c# 1 calc 1 campaign 1 cards 1 centralized platforms 1 chemistry 1 clearance 1 cli tools 1 clothing industry 1 cmd 1 code editor 1 cognitive psychology 1 coldplay 1 command history 1 command line 1 command prompt 1 comments 1 computer interaction 1 concentration 1 configuration management 1 conntrack 1 console 1 conspiracy 1 conspiracy theories 1 controversial 1 converter 1 corporate world 1 cost optimization 1 courses 1 courses for free 1 dark mode 1 data security 1 date and time 1 deep learning 1 design systems 1 developer tools 1 digital clothing 1 digitalization 1 disqus 1 document 1 document conversion 1 dreams 1 drop of motivation 1 dubai 1 dying 1 e-book 1 eBPF 1 economy 1 ecosystem restoration 1 end of the world 1 end of world 1 energy 1 energy efficiency 1 environment and health 1 ethical AI 1 evolution 1 excel 1 exploitation 1 extreme 1 file sharing 1 file size 1 film zone 1 flash drive 1 flat earth 1 flying 1 food 1 football 1 for sale 1 format change 1 free 1 free software 1 friend location 1 future of humanity 1 future of transport 1 future skills 1 game 1 geoengineering 1 google chat 1 graphics 1 graphics editors 1 growing up 1 hacking 1 happiness 1 hard-link 1 hashing 1 hedonic adaptation 1 helion 1 history 1 hobby 1 home hosting 1 hostname 1 hostnamectl 1 how many people live on earth 1 humanity 1 humor 1 iOS 1 iPhone 18 Pro 1 iPhone launch 1 iftop 1 immortality 1 influencer criticism 1 infrastructure 1 innovation 1 installation 1 intelligence 1 internet applications 1 investing 1 javascript 1 job market 1 kuba wojewódzki 1 labor market 1 language models 1 light 1 login 1 loop-audit 1 loop-cost 1 loop-init 1 macOS 1 magic 1 make life harder 1 making money 1 markdown 1 markitdown 1 material design 1 meditation 1 memory 1 messenger 1 meteorology 1 microsoft 1 mobile applications 1 mobile photography 1 mounting 1 mp3 player 1 music 1 music player 1 mysteries 1 nature conservation 1 net use 1 nethogs 1 network monitoring 1 network resources 1 network security 1 networking 1 neurobiology 1 neuropsychology 1 neurotechnology 1 new life 1 new player 1 new things 1 nftables 1 office 1 onboarding 1 onestep4red 1 online 1 online courses 1 open source 1 operating systems 1 outage 1 paper clips 1 paradox of the fulfilled dream 1 parenting 1 parents 1 password 1 password change 1 password policy 1 password recovery 1 password security 1 pdf 1 penetration testing 1 performance 1 personal data 1 philosophy 1 phishing 1 php 1 plague 1 player 1 poison 1 police 1 predictions 1 promissory notes 1 protection 1 questions 1 radar 1 red 1 relax 1 relaxation 1 remote work 1 reportage 1 rest 1 robotaxi 1 root 1 routing 1 satellite data 1 science 1 scientific facts 1 screen 1 screenshot 1 series 1 show 1 skydive 1 sleep 1 small big company 1 smart clothing 1 smartphone 1 smartphones 1 social engineering 1 social media 1 society 1 space 1 sport 1 sports 1 spreadsheet 1 stalking 1 statistics 1 streaming 1 sub-millimeter sensor 1 success 1 symbolic link 1 syngrapha 1 system acceleration 1 tablet 1 talk show 1 technical documentation 1 technology regulations 1 television 1 terrorism 1 testing 1 the world in numbers 1 threats 1 time management 1 time travel 1 timelapse 1 tips 1 two-factor authentication 1 ubuntu 1 upbringing 1 users 1 viral 1 virtualbox 1 walking 1 walking meetings 1 weather forecasting 1 webmaster 1 windows automation 1 word processing 1 work 1 work automation 1 world 1 world cup 2026 1 world wide web 1 you are a miracle 1 zeitgeist 1

Blog archive

Table of contents