Capture web pages to local device or backend server for future retrieval, organization, annotation, and editing.
WebScrapBook is a browser extension that captures the web page faithfully with various archive formats and customizable configurations, for future retrieval, organization, annotation, and editing. This project inherits from legacy Firefox add-on ScrapBook X. Features: 1. Capture faithfully: A web page shown in the browser can be captured without losing any subtle detail. Metadata such as source URL and timestamp are also recorded. 2. Customizable capture: WebScrapBook can save selected area in a page, save source page (before processed by scripts), or save page as a bookmark. How to capture images, audio, video, fonts, frames, styles, scripts, etc. are also customizable. A web page can be saved as a folder, a ZIP-based archive file (HTZ or MAFF), or a single HTML file. 3. Page editing: A web page can be highlighted, annotated, or edited before or after a capture. 4. Organizable collections: Captured pages can be organized in the browser sidebar using one or more scrapbooks, and each scrapbooks holds a hierarchical tree structure to organize data items. Notes using HTML or markdown format can also be created and managed. (*) 5. Fulltext searching: Each scrapbook can be further indexed for a rich-feature search (using title, fulltext, comment, source URL, create time, modify time, etc.). (*) 6. Remote access: Captured data can be hosted with a central backend server and be read or edited from other devices. Alternatively, a scrapbook can generate a static site index and be distributed as a static web site. (*) 7. Mobile support: WebScrapBook supports mobile browsers such as Firefox for Android and Kiwi browser. You can capture and edit the web page from a mobile phone or tablet. 8. Legacy ScrapBook support: Scrapbooks created from legacy ScrapBook or ScrapBook X can be converted into WebScrapBook-compliant format for usage. (*) * All or partial functionality of a starred feature above requires a running collaborating backend server, which can be easily set up using PyWebScrapBook. (*) * An HTZ or MAFF archive file can be viewed using the built-in archive page viewer, with PyWebScrapBook or other assistant tools, or by opening the index page after unzipping. See Also: * For further information and frequently asked questions, visit the documentation wiki: https://github.com/danny0838/webscrapbook/wiki/Intro * For better discussion, please report an issue to the source repository: https://github.com/danny0838/webscrapbook/issues * Donate to support us if you find this tool helpful: https://www.paypal.me/danny0838/5usd
- (2023-06-04) ru ve: I was hoping to save a complete webpage for offline use. I enabled the extension and just tried the default "Capture Page" feature. I got a new dialog box to save every single resource on the page, like every image and css file. What an absurd. The page I wanted to save contained a very simple gallery of thumbnails linked to larger images. The extension saved only the thumbnails, and they were still linking to the large image online. I find this extensions completely useless, and very user unfriendly.
- (2023-03-22) Notnilc: I have very limited programming experience so there might be some dunning kreuger at play here, but this much, much, much, MUCH better than httrack or cyotek webcopy. The documentation is great and all, but I feel like most of it could be made redundant with a simple video tutorial.
- (2022-11-07) design Source: 一直显示下载失败，但Firefox是正常的。
- (2022-08-19) 张明浩: 1. 完美剪辑HTML，标注HTML 2. 提供PYTHON的后端——实现一端保存，多端查看（docker-PyWebScrapBook github仓库） 3. 免费，感谢作者用爱发电 插件很棒💖💖💖
- (2021-12-03) Алексей kabelsis: Ужасно! Открывает миллион диалоговый окон сохранения изображений и файлов при сохранении всей страницы
- (2021-11-29) Полат Османов: Очень полезное расширение! Рекомендую
- (2021-10-13) F Y: 不支持修改网页内容
- (2021-06-22) EDEN EDEN: Good!
- (2021-06-12) Li Su: Agree with Clarence Domesticus Wonderful extension, as a ScrapbookX user for years, I think this extension is able to do almost the same as scrapbookX, and additional features of PyWebScrapBook backend make it more useful. I can stop using the now very very slow old version of firefox eventually. Thanks a lot.
- (2021-05-28) null_404: 网页批注神器
- (2021-04-18) behrouz 40: don't work
- (2021-03-15) Frank Maloney: I am sorry to have to give this promising extension 1 star, however I have spent 5 hours trying to save a page and the pages linked from that page. You can play around with depth and filters (God knows what "Each following line is a full URL (with chars following a “#” or space stripped) or a regular expression (e.g. “/^http://example\.com//”). " in the options is supposed to mean,). This is by far the most frustrating, and time wasting Chrome extension that I have ever installed. It simply does not work! Lastly, I tried both the Chrome and Firefox versions and only ever end up with the orginal page being saved i.e. no subpages.
- (2021-02-22) Сергей Еремеев: Страницы вроде как сохраняет, но ссылка приложения view captured pages не активна, от чего нет возможности посмотреть список сохраненных страниц
- (2021-01-23) Дмитрий Доденко: При сохранении вкладки требует подтверждения для каждого файла. А если их сотни..
- (2020-11-10) Reng: Worked perfectly with my old scrapbook files, thank you!! Can import and handle thousands of entries with no problem. The search function is infinitely better than the old scrapbook. Once again thank you so much for this extension.
- (2020-11-07) 叭噗バプ: 擷取的時候，要手動，一個一個儲存檔案 非常麻煩 要如何設定
- (2020-10-16) yun kong: 太厉害了！
- (2020-10-07) Clarence Domesticus: This is my first experience using any such scrapbook extension & setting up the backend server (PyWebScrapBook, as noted in the extension overview). Overall, I'm very satisfied. It does fail on some sites, but after spending some time tweaking the settings I've been able to use it successfully on most sites I've needed it for. There are plenty of options & features, definitely more than enough to handle nearly any task for which I've needed this extension. I like that it's an ongoing project that's still being updated regularly. I see there are a number of negative reviews from individuals who are upset that this doesn't work like some other scrapbooks they've used in the past (EVEN SOME IN ALL CAPS). This seems to be a common thing reviewers like to do. In my opinion these comparisons are somewhat of an unfair basis for 1-star & 2-star ratings. If there are other options that some users enjoy using more, they should use those extensions/scrapbooks instead. I think it would be absurd, for example, if were to go out of my way to purchase a pack of markers specifically by BrandX and then left negative reviews because they're not BrandY, rather than evaluating BrandX based on its own qualities. Some 1-star reviews are clearly written by users who couldn't bother to follow basic configuration instructions or even take a simple look around the menus/folders. I wish ppl would stop doing this s*** not just here but in all their reviews. I just checked github, and it looks like are commits from as recently as 3 hours ago; like I said, this is an ongoing project, and I'm excited to see where the contributors take it. My only criticism as of right now is that, after updating my chrome, firefox, and edge extensions, and updating to the newest server version, chrome extension v 78.2 (the most recent version available in the store) is giving error 'Server app requires extension version >= 0.79.0'. But the FF extension is working perfectly fine. Again, I'm seeing github activity as recently as today, so anticipate that this will be resolved very soon.
- (2020-09-05) פרטי: לא עובד בכלל
- (2020-08-23) David Morales Molina: Muy mala no cumplio su proposito
- (2020-08-05) arshdeep singh gill: Works flawlessly for me
- (2020-07-06) Kang Chen: 非常好用的插件
- (2020-04-09) Mehdi Deilami: Seems like the extension doesn't use the cache and re-grabs the resources which is kind of inefficient
- (2020-04-06) Nima Fariba: Not working. I set Address: http://localhost:8080/ but ger error: Backend initilization error: Unable to connect to backend server.
- (2019-12-27) Jiahuang Zhang: 对于本地的html文件，可以编辑再保存，格式完美还原，简直html做笔记神器
- (2019-12-20) 00 “啊啊啊” 啊啊啊: 我试用了各种保存网页的插件，包括各种云笔记的剪藏工具，这个几乎是唯一能有效把微信公众号保存成html的。。作者加油。。
- (2019-10-27) 謝昀佑: 能夠直接擷取url嗎 擷取所有分頁雖然好用 但是我要開的分頁多到當機 希望加上這個功能
- (2019-09-08) Funny Domination: I love it! It has many useful options
- (2019-08-29) I K: It saves the pages faithfully
- (2019-08-27) Harris121 Channel: NOT THE SAME AS ORIGINAL SCRAPBOOK AT ALL. THE ONLY THING THAT IS THE SAME IS THE FRIGGIN' ICON! CONFUSING, POPUP BOXES, NO WAY TO "VIEW" THE SCRAPBOOK, NOTHING. I'VE TRIED TO FIGURE THIS OUT FOR THREE HOURS....WHERE ARE THE SAVED PAGES?...GEEZ. NO MARKUPS, HIGHLIGHTING, NOTHING....THIS IS (NOT) THE ORIGINAL FIREFOX EXTENSION....NOT EVEN A "CLOSE" COPY OF IT. (EXCEPT THE FRIGGIN' ICON). SAD. ****IF ANYONE KNOWS OF A GREAT ALTERNATIVE TO THE ORIGINAL FF SCRAPBOOK...PLZ POST!!!!! THANKS!
- (2019-08-02) Arcadiy Tpr: Saves pages nicely. Has a lot of features. I highly recommend it for a reliable offline websites archive.
- (2019-07-19) Dima: Great extension! Please continue development.
- (2019-06-29) Dmitry Kislitsyn: This extension works great! There are still some sites that it fails on, but very few and this is work in progress (check out developer's github), so the extension gets better! Excellent job and keep it up please!
- (2019-06-20) Север Петров: Расширение хорошее но Mozilla Archive Format (для Firefox) наследником которого он является сохраняет страницы в maff файлы меньшего размера. Вот пример:https://en.wikipedia.org/wiki/Mozilla_Archive_Format в firefox сохранилось в файл 57кб версия для Chrome сохранила в файл размером 206кб
- (2019-05-03) Michael Johnson: I tried this app to save webpages completely and accurately. It works on some pages like ghacks.net perfectly with scripted single html . On other pages like nytimes.com it captures the page out of sync even though all of the content seems to be there (large gap spaces, enlarged photos, etc.) Save Page WE has the same issue. On Washingtonpost.com WebScrapbook was almost perfect but there is a bug that will add incorrect characters if there is an apostrophe in the text(which in a news article there will undoubtedly be). I used scripted single html option on this also. I do have specific scripts for the Times and WPost running, but they are not the issue since Mozilla Archive Format and SingleFile always works perfectly on the same sites with the same scripts running. But since MAF doesnt work for current browsers and SingleFile works somewhat inconsistently (it stalls a lot), I was hoping WebScrapbook would work but no go. Also, I havent seen an option to save the original page url either in the title or in the .html file for reference like MAF, Singlefile, or SavePage WE can. This app might be able to save websites but if it cant do it accurately what's the point of using it.
- (2019-03-05) Cesar Andrés Vacca Devia: Muy una extensivo para descarga de paginas web, falta la opción de exportar
- (2019-02-04) 雷雨: 以前用過FF版,非常之好用的擴展,在CHROME上就是一坨屎,保存HTZ完全沒法用CHROME打開,打開又自動下載然後就沒有然後了,不知道作者自己是不是連測試都沒測試過,還有那個建立索引也不知道什麼鬼功能,碼了一堆字然而完全看不懂什麼玩意.
- (2019-01-09) Matt Cooper: Seems to work well. One thing it seems to lack that the old Firefox add in did, is drill down beyond the current tab. I'd like to copy a web page and the linked files as well, but it does not go beyond, even though I can click on the linked files and capture them individually. Is this a supported option I don't see or is it planned to be supported in the future?
- (2018-10-17) Crihy Chu: 無作動
- (2018-04-25) Option "Save captured data in ScrapBook" is ignoring and all data saved to Download folder. It's problem is missed in Firefox, but new Firefox (57+) is crap.
- (2018-04-09) Budi Susilo: Thou this is NOT the same as Firefox's Scrapbook, but this extension can READ and WRITE to the now deprecated Firefox .maff (I have hundreds of them, and I think maff is great format, see the discussion on the WebScrapbook github). The limitation is this extension currently limited to single tab .maff. Based on the discussion in the WebScrapbook github this might change in the near future. Thank you for the developer for creating such a nice extension.
- (2018-03-03) Avi Schwartz: Crashes when trying to import from the original Scrapbook X.
- (2018-01-12) zech xu: not working in the same way with firefox scrapbook at all.
- (2017-12-12) Дмитрий Горбачёв: Everything was fine, but after the update 06.10.17, it stopped working (does not save script files and httml in "Folder" mode). Please fix the problem, because I really liked the extension.
- (2017-12-08) Mirek eS: How can I change the default name of a saved file?
- (2017-11-22) G. Ivan: Респект!
- (2017-11-10) Nils Andrey Telleria Martinez: Just miss the organizer like in Firefox's old version. But is just great to have single-html and maff features. Thanks!
- (2017-11-09) Bill Gates: It's buggy in vivaldi. And, also, very not functional. It's not fair to call it scrapbook. Some alternatives is closer to firefox scrapbook, with file system access.
- (2017-10-21) Darren Bardsley: Just saves the page which you can do anyway. I need something like Firefox's Scrapbook.
- (2017-09-15) Юрий: I cant believe - its a legendary scrapbook from firefox!? All functions working so far so good, only missing list of all saved pages.
- (2023-07-14, v:2.0.4) curry kumachan: Problem occurs after the update.
I have always found it useful. I'm using it with backend server and a Chrome extension, but something seems to be wrong since I upgraded to version 2.0.1. Site captures are completed without error and I can confirm that the htz files are created, but the meta information tree(meta.js, toc.js) file does not seem to be updated. Also, I can no longer delete or move existing items from "Open Scrapbook menu" (Failed to query: recycle_items(items=[['root', [xx]]): Item not exist: 'root'[xx].) The properties of the existing items seem to be viewed successfully. I don't know which is the problem, the backend or the extension, but it seems to me that the extension is causing the problem. Can you give me some clues to solve this problem? I will provide any missing information.
- (2021-10-09, v:0.125.6) Max VA: Full Site Rip?
I want to rip a full site and while the extension is amazing one page at a time is rough for like 3 million. I'd like to rip a site since httrack and other software won't work since the site is behind cloudflare. This is the only thing that seems to work really well but turning all the links into one that direct to local versions would be weird.
- (2021-03-14, v:0.100.0) Frank Maloney: How to create in-depth captures?
I simply cannot work out how to use this feature. I have set the depth to 2 and yet only the first page is downloaded i.e. links are not traversed? Some expanded documentation would be nice.
- (2021-01-11, v:0.97.0) Javier Steinaker: Overwrite file/folder
Hi, thanks for your extension, super useful and it actually downloads tricky webpages as it should. Is there a way for WebScrapBook to overwrite existing files/folders? I just need to keep the most recent version of a website, always using the same name. If I choose "index" as a file or folder name and it exists, WebScrapBook will save the webpage as "index (1)". Thanks in advance.
- (2020-11-07, v:0.91.0) 叭噗バプ: How to setup PyScrapbook
https://pypi.org/project/webscrapbook/ I only understand to install python and done it. What should I do next? I want to download the full web file without clicking the "Save" button one by one.
- (2020-11-03, v:0.84.4) En X: WebScrapbook
Hello Danny, I was in love with the original Scrapbook back then when I was using Firefox only, and the must have feature for me was the in depth capture because for example if I had an article with several images linked (to the "full size" file) It would be super easy to tell him I wanted to do a 1-depth save and then navigate it offline (and IIRC I had several option to limit what I wanted to save of the linked pages, to save space). My question is: does your extention have that feature? It's really important if I want to organize my "offline archive" :) Thanks and keep up with the really good work
- (2020-07-24, v:0.75.4) L. Tallash: Keyboard shortcuts
Dear developer, Thanks for creating this useful extension. I have a suggestion regarding shortcut for the extention. It would be great to have keyboard shortcuts for each operation such as "capture tab", i.e. to press CTRL+ALT+S buttons to capture tab, instead of clicking many times to save a tab. I hope you can add this in options. Thanks.
- (2020-04-29, v:0.68.0) Challenger 420: How to download specific folder?
How to filter specific folder on the website to download? For example, I want to download everything that contains /shop/ in the URL and not to download everything else? I quess this filter should be somehow implemented to "Default search condiiton" field on options page. But what's the syntaxis?
- (2020-04-08, v:0.65.0) Peter Kozej: Annotation
Hello, how do I do page annotation mentioned in description? I've got everything else working as expected using my own backend server, but couldn't figure this out.
- (2020-04-01, v:0.62.0) Alejandro Avila: How
how i can save a website with 1 level of deep with links, sorry for my enlgish. thanks
- (2019-07-26, v:0.45.0) Максим Клочков: Есть ли рандомизация скаченных файлов?
К примеру, нужно скачать файлы с сайта так, что бы они автоматически поменяли свое название, можно ли так тут?
- (2019-05-13, v:0.42.0) dan: Firefox import
hi, I'm trying to import my multi-scrapbook from firefox but I so far have not found the right way to do it. can you help?
- (2018-12-26, v:0.27.0) Arthur P. Meiners: Calibre compatible HTMLZ vs HTZ
Dear Mr Lin, I used to like Firefox MAFF files but Mozilla dropped support for their own standard. I-ve been looking at Web Scrapbook on Android (not very stable yet) and Windows (works well generally), but also at other solutions to ZIP HTML pages. Your HTZ solution is simpler than MAFF, but does not seem compatible with most other solutions. While searching I came accross the HTMLZ solution for Calibre (see the simple specs here: https://www.mobileread.com/forums/showthread.php?t=241414 ). This has the advantage of simplicity, as you can even get away with just renaming an .htz to htmlz and importing into Calibre, but does also allow adding source info into the metadata.opf file, of indexing of a large collection in Calibre, and of conversions to other formats through Calibre. I believe it would even be quite simple to convert existing .maff archives to .htmz format with a batch file to extract, add an index.html that redirects to the data directory, potentially add a metadata.opf file, and then rezip but now with the .htmlz extension). Would it be worth switching Web Scrapbook from HTZ to HTMLZ or to add HTMLZ ?
- (2018-11-29, v:0.27.0) dan pot: legacy scrapbook
when i try to import legacy scrapbook content,the process seems to work,but in the end i cannot view original saved pages.I can only view the tree
- (2018-10-10, v:0.27.0) Brian O'Keefe: how to open map.html or frame.html directly using the Web Scrapbook icon
Thanks Danny-great extension if I can figure it out. I copied my ScrapbookX folder to my desktop (Linux and Chrome, BTW). I followed your instructions and can view my tree by clicking map.html or frame.html in the copied folder and choose "open with Chrome". Is there not a way to use the WebScrapbook icon to open the tree instead of having to navigate to the copied folder? Do I need to make a bookmark to the file map or frame.html? I assume this is simpler than I am understanding. Also, now I have the copied ScrapbookX folder on my Desktop. What do I do with that? Do I place it back into my .mozilla folder? Delete it (probably not)? Thanks again!!
- (2018-10-10, v:0.27.0) Brian O'Keefe: Great extension, I think
Thanks for this Danny. I followed your instructions to use my scrapbookX tree (Linux-Chrome BTW). I can open it from the copied Scrapbook file that is now on my desktop w/ map.html or frame.html and the tree shows in a tab in Chrome. Is there not a way to just click on something using the Web Scrapbook icon and automatically open the tree? Otherwise do I have to go into the copied Scrapbook X file everytime I want to view an archived page? That doesn't seem so simple. I guess I could make a book mark for that file? Am I missing something? Also, what do I do with the copied ScrapbookX folder on my desktop that contains the map and frame.html files?
- (2018-09-24, v:0.27.0) Татьяна Романова: не открывается архивный файл
Я сняла флажок «Использовать API-интерфейс файловой системы», но htz не отображает видео в браузере. Были ссылки на youtube. (win 10, chrome 69.0.3497.100, ScrapBook 0.27.0)
- (2018-08-22, v:0.26.4) eng hooi Khoo: Unable to view archived files
When any archived files is picked or drag into the pick zip files window, the page won't show, instead it only showed: Loading: '20180821153438911.htz'... Done.
- (2018-08-21, v:0.26.4) eng hooi Khoo: Unable to view archived files
When any archived files is picked or drag into the pick zip files window, the page won't show, instead it only showed: Loading: '20180821153438911.htz'... Done.
- (2018-04-30, v:0.26.3) Steven Goss: scrapbook.rdf - additional question
Appears my earlier q was cut short. If you don't have plan to support scrapbook.rdf, would you welcome contribution by another developer to add the legacy scrapbook.rdf support??